[jira] [Commented] (PIG-4724) GROUP ALL must create an output record in case there is no input

2015-11-03 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988012#comment-14988012
 ] 

Prashant Kommireddi commented on PIG-4724:
--

[~rohini] what are your thoughts on this? I'm not sure if this breaks the way 
users expect output to be based on the current behavior, but sounds like the 
right thing to do?

cc [~daijy]

> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4724) GROUP ALL must create an output record in case there is no input

2015-11-03 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4724:


 Summary: GROUP ALL must create an output record in case there is 
no input
 Key: PIG-4724
 URL: https://issues.apache.org/jira/browse/PIG-4724
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Prashant Kommireddi


{code}
A = load 'data';

B = filter A by $0 == 'THIS_DOES_NOT_EXIST';

C = group A ALL;

D = foreach C generate group, COUNT(B);
{code}

Even if the filter did not output any rows, since we are grouping on ALL the 
expected output should probably be (ALL, 0). The implementation generates a 
pseudo key “all” for every input on map side, thus reduce side we can combine 
all input together. However, this does not work for 0 input since the reduce 
side does not get any input. If the input is empty, yield a pseudo “all, 0” to 
reduce




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4724) GROUP ALL must create an output record in case there is no input

2015-11-03 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4724:
-
Description: 
{code}
A = load 'data';

B = filter A by $0 == 'THIS_DOES_NOT_EXIST';

C = group B ALL;

D = foreach C generate group, COUNT(B);
{code}

Even if the filter did not output any rows, since we are grouping on ALL the 
expected output should probably be (ALL, 0). The implementation generates a 
pseudo key “all” for every input on map side, thus reduce side we can combine 
all input together. However, this does not work for 0 input since the reduce 
side does not get any input. If the input is empty, yield a pseudo “all, 0” to 
reduce


  was:
{code}
A = load 'data';

B = filter A by $0 == 'THIS_DOES_NOT_EXIST';

C = group A ALL;

D = foreach C generate group, COUNT(B);
{code}

Even if the filter did not output any rows, since we are grouping on ALL the 
expected output should probably be (ALL, 0). The implementation generates a 
pseudo key “all” for every input on map side, thus reduce side we can combine 
all input together. However, this does not work for 0 input since the reduce 
side does not get any input. If the input is empty, yield a pseudo “all, 0” to 
reduce



> GROUP ALL must create an output record in case there is no input
> 
>
> Key: PIG-4724
> URL: https://issues.apache.org/jira/browse/PIG-4724
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the 
> expected output should probably be (ALL, 0). The implementation generates a 
> pseudo key “all” for every input on map side, thus reduce side we can combine 
> all input together. However, this does not work for 0 input since the reduce 
> side does not get any input. If the input is empty, yield a pseudo “all, 0” 
> to reduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"

2015-11-03 Thread Nathan Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Smith updated PIG-4725:
--
Attachment: PIG-4725.patch

> Typo in FrontendException messages "Incompatable"
> -
>
> Key: PIG-4725
> URL: https://issues.apache.org/jira/browse/PIG-4725
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
> Attachments: PIG-4725.patch
>
>
> There is a typo in some "FrontendException" error messages where 
> "Incompatible" is misspelled as "Incompatable".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"

2015-11-03 Thread Nathan Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Smith updated PIG-4725:
--
Priority: Trivial  (was: Major)

> Typo in FrontendException messages "Incompatable"
> -
>
> Key: PIG-4725
> URL: https://issues.apache.org/jira/browse/PIG-4725
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Priority: Trivial
> Attachments: PIG-4725.patch
>
>
> There is a typo in some "FrontendException" error messages where 
> "Incompatible" is misspelled as "Incompatable".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4725) Typo in FrontendException messages "Incompatable"

2015-11-03 Thread Nathan Smith (JIRA)
Nathan Smith created PIG-4725:
-

 Summary: Typo in FrontendException messages "Incompatable"
 Key: PIG-4725
 URL: https://issues.apache.org/jira/browse/PIG-4725
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Nathan Smith


There is a typo in some "FrontendException" error messages where "Incompatible" 
is misspelled as "Incompatable".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"

2015-11-03 Thread Nathan Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Smith updated PIG-4725:
--
Status: Patch Available  (was: Open)

> Typo in FrontendException messages "Incompatable"
> -
>
> Key: PIG-4725
> URL: https://issues.apache.org/jira/browse/PIG-4725
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>
> There is a typo in some "FrontendException" error messages where 
> "Incompatible" is misspelled as "Incompatable".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"

2015-11-03 Thread Nathan Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Smith updated PIG-4725:
--
Status: Open  (was: Patch Available)

> Typo in FrontendException messages "Incompatable"
> -
>
> Key: PIG-4725
> URL: https://issues.apache.org/jira/browse/PIG-4725
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>
> There is a typo in some "FrontendException" error messages where 
> "Incompatible" is misspelled as "Incompatable".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4726) "Incompatable field schema" with MIN(datetime_field) and explicit output type

2015-11-03 Thread Nathan Smith (JIRA)
Nathan Smith created PIG-4726:
-

 Summary: "Incompatable field schema" with MIN(datetime_field) and 
explicit output type
 Key: PIG-4726
 URL: https://issues.apache.org/jira/browse/PIG-4726
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Nathan Smith
Priority: Minor


Example:

{code}
grunt> data = LOAD 'file.csv' USING PigStorage(',') AS 
(f1:chararray,f2:datetime);
grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2);
grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
earliest;
grunt> describe earliest_datum;
earliest_datum: {earliest: datetime}
grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
earliest:datetime;
2015-11-03 23:20:00,422 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1031: Incompatable field schema: declared is "earliest:datetime", infered is 
":double"
grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
earliest:double;
2015-11-03 23:20:07,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1031: Incompatable field schema: declared is "earliest:double", infered is 
":datetime"
{code}

The example is contrived, but applying MIN to other field types in the same 
fashion seems to behave as expected.

Also affects MAX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2015-11-03 Thread jira
Issue Subscription
Filter: PIG patch available (27 issues)

Subscriber: pigdaily

Key Summary
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4677Display failure information on stop on failure
https://issues.apache.org/jira/browse/PIG-4677
PIG-4675Multi Store Statement will fail on the second store statement.
https://issues.apache.org/jira/browse/PIG-4675
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4417Pig's register command should support automatic fetching of jars 
from repo.
https://issues.apache.org/jira/browse/PIG-4417
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


[jira] [Resolved] (PIG-4725) Typo in FrontendException messages "Incompatable"

2015-11-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-4725.
-
   Resolution: Fixed
 Assignee: Nathan Smith
 Hadoop Flags: Reviewed
Fix Version/s: 0.16.0

Committed to trunk. Thanks Nathan!

> Typo in FrontendException messages "Incompatable"
> -
>
> Key: PIG-4725
> URL: https://issues.apache.org/jira/browse/PIG-4725
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Assignee: Nathan Smith
>Priority: Trivial
> Fix For: 0.16.0
>
> Attachments: PIG-4725.patch
>
>
> There is a typo in some "FrontendException" error messages where 
> "Incompatible" is misspelled as "Incompatable".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4726) "Incompatable field schema" with MIN(datetime_field) and explicit output type

2015-11-03 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989056#comment-14989056
 ] 

Daniel Dai commented on PIG-4726:
-

The walk around is not declare type in AS subclause.

This might be some type checking error before Pig infer the right MIN 
implementation.

> "Incompatable field schema" with MIN(datetime_field) and explicit output type
> -
>
> Key: PIG-4726
> URL: https://issues.apache.org/jira/browse/PIG-4726
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Priority: Minor
>
> Example:
> {code}
> grunt> data = LOAD 'file.csv' USING PigStorage(',') AS 
> (f1:chararray,f2:datetime);
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2);
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
> earliest;
> grunt> describe earliest_datum;
> earliest_datum: {earliest: datetime}
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
> earliest:datetime;
> 2015-11-03 23:20:00,422 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable field schema: declared is "earliest:datetime", infered is 
> ":double"
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
> earliest:double;
> 2015-11-03 23:20:07,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable field schema: declared is "earliest:double", infered is 
> ":datetime"
> {code}
> The example is contrived, but applying MIN to other field types in the same 
> fashion seems to behave as expected.
> Also affects MAX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PIG-4726) "Incompatable field schema" with MIN(datetime_field) and explicit output type

2015-11-03 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989056#comment-14989056
 ] 

Daniel Dai edited comment on PIG-4726 at 11/4/15 7:27 AM:
--

The work around is not declare type in AS subclause.

This might be some type checking error before Pig infer the right MIN 
implementation.


was (Author: daijy):
The walk around is not declare type in AS subclause.

This might be some type checking error before Pig infer the right MIN 
implementation.

> "Incompatable field schema" with MIN(datetime_field) and explicit output type
> -
>
> Key: PIG-4726
> URL: https://issues.apache.org/jira/browse/PIG-4726
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Nathan Smith
>Priority: Minor
>
> Example:
> {code}
> grunt> data = LOAD 'file.csv' USING PigStorage(',') AS 
> (f1:chararray,f2:datetime);
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2);
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
> earliest;
> grunt> describe earliest_datum;
> earliest_datum: {earliest: datetime}
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
> earliest:datetime;
> 2015-11-03 23:20:00,422 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable field schema: declared is "earliest:datetime", infered is 
> ":double"
> grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS 
> earliest:double;
> 2015-11-03 23:20:07,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable field schema: declared is "earliest:double", infered is 
> ":datetime"
> {code}
> The example is contrived, but applying MIN to other field types in the same 
> fashion seems to behave as expected.
> Also affects MAX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)