[jira] [Commented] (PIG-4724) GROUP ALL must create an output record in case there is no input
[ https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988012#comment-14988012 ] Prashant Kommireddi commented on PIG-4724: -- [~rohini] what are your thoughts on this? I'm not sure if this breaks the way users expect output to be based on the current behavior, but sounds like the right thing to do? cc [~daijy] > GROUP ALL must create an output record in case there is no input > > > Key: PIG-4724 > URL: https://issues.apache.org/jira/browse/PIG-4724 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Prashant Kommireddi > > {code} > A = load 'data'; > B = filter A by $0 == 'THIS_DOES_NOT_EXIST'; > C = group B ALL; > D = foreach C generate group, COUNT(B); > {code} > Even if the filter did not output any rows, since we are grouping on ALL the > expected output should probably be (ALL, 0). The implementation generates a > pseudo key “all” for every input on map side, thus reduce side we can combine > all input together. However, this does not work for 0 input since the reduce > side does not get any input. If the input is empty, yield a pseudo “all, 0” > to reduce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4724) GROUP ALL must create an output record in case there is no input
Prashant Kommireddi created PIG-4724: Summary: GROUP ALL must create an output record in case there is no input Key: PIG-4724 URL: https://issues.apache.org/jira/browse/PIG-4724 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Prashant Kommireddi {code} A = load 'data'; B = filter A by $0 == 'THIS_DOES_NOT_EXIST'; C = group A ALL; D = foreach C generate group, COUNT(B); {code} Even if the filter did not output any rows, since we are grouping on ALL the expected output should probably be (ALL, 0). The implementation generates a pseudo key “all” for every input on map side, thus reduce side we can combine all input together. However, this does not work for 0 input since the reduce side does not get any input. If the input is empty, yield a pseudo “all, 0” to reduce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4724) GROUP ALL must create an output record in case there is no input
[ https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-4724: - Description: {code} A = load 'data'; B = filter A by $0 == 'THIS_DOES_NOT_EXIST'; C = group B ALL; D = foreach C generate group, COUNT(B); {code} Even if the filter did not output any rows, since we are grouping on ALL the expected output should probably be (ALL, 0). The implementation generates a pseudo key “all” for every input on map side, thus reduce side we can combine all input together. However, this does not work for 0 input since the reduce side does not get any input. If the input is empty, yield a pseudo “all, 0” to reduce was: {code} A = load 'data'; B = filter A by $0 == 'THIS_DOES_NOT_EXIST'; C = group A ALL; D = foreach C generate group, COUNT(B); {code} Even if the filter did not output any rows, since we are grouping on ALL the expected output should probably be (ALL, 0). The implementation generates a pseudo key “all” for every input on map side, thus reduce side we can combine all input together. However, this does not work for 0 input since the reduce side does not get any input. If the input is empty, yield a pseudo “all, 0” to reduce > GROUP ALL must create an output record in case there is no input > > > Key: PIG-4724 > URL: https://issues.apache.org/jira/browse/PIG-4724 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Prashant Kommireddi > > {code} > A = load 'data'; > B = filter A by $0 == 'THIS_DOES_NOT_EXIST'; > C = group B ALL; > D = foreach C generate group, COUNT(B); > {code} > Even if the filter did not output any rows, since we are grouping on ALL the > expected output should probably be (ALL, 0). The implementation generates a > pseudo key “all” for every input on map side, thus reduce side we can combine > all input together. However, this does not work for 0 input since the reduce > side does not get any input. If the input is empty, yield a pseudo “all, 0” > to reduce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"
[ https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Smith updated PIG-4725: -- Attachment: PIG-4725.patch > Typo in FrontendException messages "Incompatable" > - > > Key: PIG-4725 > URL: https://issues.apache.org/jira/browse/PIG-4725 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith > Attachments: PIG-4725.patch > > > There is a typo in some "FrontendException" error messages where > "Incompatible" is misspelled as "Incompatable". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"
[ https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Smith updated PIG-4725: -- Priority: Trivial (was: Major) > Typo in FrontendException messages "Incompatable" > - > > Key: PIG-4725 > URL: https://issues.apache.org/jira/browse/PIG-4725 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Priority: Trivial > Attachments: PIG-4725.patch > > > There is a typo in some "FrontendException" error messages where > "Incompatible" is misspelled as "Incompatable". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4725) Typo in FrontendException messages "Incompatable"
Nathan Smith created PIG-4725: - Summary: Typo in FrontendException messages "Incompatable" Key: PIG-4725 URL: https://issues.apache.org/jira/browse/PIG-4725 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Nathan Smith There is a typo in some "FrontendException" error messages where "Incompatible" is misspelled as "Incompatable". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"
[ https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Smith updated PIG-4725: -- Status: Patch Available (was: Open) > Typo in FrontendException messages "Incompatable" > - > > Key: PIG-4725 > URL: https://issues.apache.org/jira/browse/PIG-4725 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith > > There is a typo in some "FrontendException" error messages where > "Incompatible" is misspelled as "Incompatable". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4725) Typo in FrontendException messages "Incompatable"
[ https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Smith updated PIG-4725: -- Status: Open (was: Patch Available) > Typo in FrontendException messages "Incompatable" > - > > Key: PIG-4725 > URL: https://issues.apache.org/jira/browse/PIG-4725 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith > > There is a typo in some "FrontendException" error messages where > "Incompatible" is misspelled as "Incompatable". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4726) "Incompatable field schema" with MIN(datetime_field) and explicit output type
Nathan Smith created PIG-4726: - Summary: "Incompatable field schema" with MIN(datetime_field) and explicit output type Key: PIG-4726 URL: https://issues.apache.org/jira/browse/PIG-4726 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Nathan Smith Priority: Minor Example: {code} grunt> data = LOAD 'file.csv' USING PigStorage(',') AS (f1:chararray,f2:datetime); grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2); grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS earliest; grunt> describe earliest_datum; earliest_datum: {earliest: datetime} grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS earliest:datetime; 2015-11-03 23:20:00,422 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "earliest:datetime", infered is ":double" grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS earliest:double; 2015-11-03 23:20:07,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "earliest:double", infered is ":datetime" {code} The example is contrived, but applying MIN to other field types in the same fashion seems to behave as expected. Also affects MAX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (27 issues) Subscriber: pigdaily Key Summary PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4677Display failure information on stop on failure https://issues.apache.org/jira/browse/PIG-4677 PIG-4675Multi Store Statement will fail on the second store statement. https://issues.apache.org/jira/browse/PIG-4675 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4641Print the instance of Object without using toString() https://issues.apache.org/jira/browse/PIG-4641 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4581thread safe issue in NodeIdGenerator https://issues.apache.org/jira/browse/PIG-4581 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4417Pig's register command should support automatic fetching of jars from repo. https://issues.apache.org/jira/browse/PIG-4417 PIG-4373Implement PIG-3861 in Tez https://issues.apache.org/jira/browse/PIG-4373 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones https://issues.apache.org/jira/browse/PIG-3864 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384
[jira] [Resolved] (PIG-4725) Typo in FrontendException messages "Incompatable"
[ https://issues.apache.org/jira/browse/PIG-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-4725. - Resolution: Fixed Assignee: Nathan Smith Hadoop Flags: Reviewed Fix Version/s: 0.16.0 Committed to trunk. Thanks Nathan! > Typo in FrontendException messages "Incompatable" > - > > Key: PIG-4725 > URL: https://issues.apache.org/jira/browse/PIG-4725 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Assignee: Nathan Smith >Priority: Trivial > Fix For: 0.16.0 > > Attachments: PIG-4725.patch > > > There is a typo in some "FrontendException" error messages where > "Incompatible" is misspelled as "Incompatable". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4726) "Incompatable field schema" with MIN(datetime_field) and explicit output type
[ https://issues.apache.org/jira/browse/PIG-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989056#comment-14989056 ] Daniel Dai commented on PIG-4726: - The walk around is not declare type in AS subclause. This might be some type checking error before Pig infer the right MIN implementation. > "Incompatable field schema" with MIN(datetime_field) and explicit output type > - > > Key: PIG-4726 > URL: https://issues.apache.org/jira/browse/PIG-4726 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Priority: Minor > > Example: > {code} > grunt> data = LOAD 'file.csv' USING PigStorage(',') AS > (f1:chararray,f2:datetime); > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2); > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS > earliest; > grunt> describe earliest_datum; > earliest_datum: {earliest: datetime} > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS > earliest:datetime; > 2015-11-03 23:20:00,422 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1031: Incompatable field schema: declared is "earliest:datetime", infered is > ":double" > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS > earliest:double; > 2015-11-03 23:20:07,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1031: Incompatable field schema: declared is "earliest:double", infered is > ":datetime" > {code} > The example is contrived, but applying MIN to other field types in the same > fashion seems to behave as expected. > Also affects MAX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PIG-4726) "Incompatable field schema" with MIN(datetime_field) and explicit output type
[ https://issues.apache.org/jira/browse/PIG-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989056#comment-14989056 ] Daniel Dai edited comment on PIG-4726 at 11/4/15 7:27 AM: -- The work around is not declare type in AS subclause. This might be some type checking error before Pig infer the right MIN implementation. was (Author: daijy): The walk around is not declare type in AS subclause. This might be some type checking error before Pig infer the right MIN implementation. > "Incompatable field schema" with MIN(datetime_field) and explicit output type > - > > Key: PIG-4726 > URL: https://issues.apache.org/jira/browse/PIG-4726 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Nathan Smith >Priority: Minor > > Example: > {code} > grunt> data = LOAD 'file.csv' USING PigStorage(',') AS > (f1:chararray,f2:datetime); > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2); > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS > earliest; > grunt> describe earliest_datum; > earliest_datum: {earliest: datetime} > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS > earliest:datetime; > 2015-11-03 23:20:00,422 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1031: Incompatable field schema: declared is "earliest:datetime", infered is > ":double" > grunt> earliest_datum = FOREACH (GROUP data ALL) GENERATE MIN(data.f2) AS > earliest:double; > 2015-11-03 23:20:07,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1031: Incompatable field schema: declared is "earliest:double", infered is > ":datetime" > {code} > The example is contrived, but applying MIN to other field types in the same > fashion seems to behave as expected. > Also affects MAX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)