[jira] Created: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.
In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split. - Key: PIG-1130 URL: https://issues.apache.org/jira/browse/PIG-1130 Project: Pig Issue Type: Bug Reporter: Ankit Modi Priority: Minor If the output generates more than one part file, the current code only gives stats of the first part file. ie. part-0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1105) COUNT_STAR accumulate interface implementation cases failure
[ https://issues.apache.org/jira/browse/PIG-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1105: Resolution: Fixed Status: Resolved (was: Patch Available) committed patch to trunk an 0.6 branch. Thanks, Thejas and Sri. COUNT_STAR accumulate interface implementation cases failure Key: PIG-1105 URL: https://issues.apache.org/jira/browse/PIG-1105 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Assignee: Sriranjan Manjunath Fix For: 0.6.0 Attachments: PIG-1105.1.patch, PIG-1105.2.patch COUNT_STAR.accumulate is calling sum() which is supposed to be used by intermediate and final parts of algebraic interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1129) Pig UDF doc: fieldsToRead function
[ https://issues.apache.org/jira/browse/PIG-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1129: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to the trunk and 0.6 branch. Thanks, Corinne Pig UDF doc: fieldsToRead function --- Key: PIG-1129 URL: https://issues.apache.org/jira/browse/PIG-1129 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: Pig-6-UDF.patch Updated Pig UDF doc to include information about the fieldsToRead function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.
[ https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1104: --- Status: Open (was: Patch Available) [zebra] Provide streaming support in Zebra. --- Key: PIG-1104 URL: https://issues.apache.org/jira/browse/PIG-1104 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0, 0.7.0 Attachments: PIG-1104.patch Hadoop streaming is very popular among Hadoop users. The main attraction is the simplicity of use. A user can write the application logic in any language and process large amounts of data using Hadoop framework. As more people start to use Zebra to store their data, we expect users would like to run Hadoop streaming scripts to easily process Zebra tables. The following lists a simple example of using Hadoop streaming to access Zebra data. It loads data from foo table using Zebra's TableInputFormat and then writes the data into output using default TextOutputFormat. $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output output -mapper 'cat' -inputformat org.apache.hadoop.zebra.mapred.TableInputFormat More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its records. Currently, when Zebra's TableInputFormat is used for input, the user script sees each line containing key_if_any\tTuple.toString() . We plan to generate CSV format representation of our Pig tuples. To this end, we plan to do the following: 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override its toString() method to present the data into CSV format. 2) On Zebra side, the tuple factory should be changed to create ZebraTuple objects, instead of DefaultTuple objects. Note that we can only support streaming on the input side - ability to use streaming to read data from Zebra tables. For the output side, the streaming support is not feasible, since the streaming mapper or reducer only emits Text\tText, the output collector has no way of knowing how to convert this to (BytesWritable,Tuple). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.
[ https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1104: --- Status: Patch Available (was: Open) [zebra] Provide streaming support in Zebra. --- Key: PIG-1104 URL: https://issues.apache.org/jira/browse/PIG-1104 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0, 0.7.0 Attachments: PIG-1104.patch Hadoop streaming is very popular among Hadoop users. The main attraction is the simplicity of use. A user can write the application logic in any language and process large amounts of data using Hadoop framework. As more people start to use Zebra to store their data, we expect users would like to run Hadoop streaming scripts to easily process Zebra tables. The following lists a simple example of using Hadoop streaming to access Zebra data. It loads data from foo table using Zebra's TableInputFormat and then writes the data into output using default TextOutputFormat. $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output output -mapper 'cat' -inputformat org.apache.hadoop.zebra.mapred.TableInputFormat More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its records. Currently, when Zebra's TableInputFormat is used for input, the user script sees each line containing key_if_any\tTuple.toString() . We plan to generate CSV format representation of our Pig tuples. To this end, we plan to do the following: 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override its toString() method to present the data into CSV format. 2) On Zebra side, the tuple factory should be changed to create ZebraTuple objects, instead of DefaultTuple objects. Note that we can only support streaming on the input side - ability to use streaming to read data from Zebra tables. For the output side, the streaming support is not feasible, since the streaming mapper or reducer only emits Text\tText, the output collector has no way of knowing how to convert this to (BytesWritable,Tuple). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1104) [zebra] Provide streaming support in Zebra.
[ https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786758#action_12786758 ] Chao Wang commented on PIG-1104: Seems Pig has some issue. I checked Pig's TestBuiltin.java test file, it runs fine with the patch. So resubmit the same patch. [zebra] Provide streaming support in Zebra. --- Key: PIG-1104 URL: https://issues.apache.org/jira/browse/PIG-1104 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0, 0.7.0 Attachments: PIG-1104.patch Hadoop streaming is very popular among Hadoop users. The main attraction is the simplicity of use. A user can write the application logic in any language and process large amounts of data using Hadoop framework. As more people start to use Zebra to store their data, we expect users would like to run Hadoop streaming scripts to easily process Zebra tables. The following lists a simple example of using Hadoop streaming to access Zebra data. It loads data from foo table using Zebra's TableInputFormat and then writes the data into output using default TextOutputFormat. $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output output -mapper 'cat' -inputformat org.apache.hadoop.zebra.mapred.TableInputFormat More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its records. Currently, when Zebra's TableInputFormat is used for input, the user script sees each line containing key_if_any\tTuple.toString() . We plan to generate CSV format representation of our Pig tuples. To this end, we plan to do the following: 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override its toString() method to present the data into CSV format. 2) On Zebra side, the tuple factory should be changed to create ZebraTuple objects, instead of DefaultTuple objects. Note that we can only support streaming on the input side - ability to use streaming to read data from Zebra tables. For the output side, the streaming support is not feasible, since the streaming mapper or reducer only emits Text\tText, the output collector has no way of knowing how to convert this to (BytesWritable,Tuple). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1104) [zebra] Provide streaming support in Zebra.
[ https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786809#action_12786809 ] Hadoop QA commented on PIG-1104: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426998/PIG-1104.patch against trunk revision 887806. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/101/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/101/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/101/console This message is automatically generated. [zebra] Provide streaming support in Zebra. --- Key: PIG-1104 URL: https://issues.apache.org/jira/browse/PIG-1104 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0, 0.7.0 Attachments: PIG-1104.patch Hadoop streaming is very popular among Hadoop users. The main attraction is the simplicity of use. A user can write the application logic in any language and process large amounts of data using Hadoop framework. As more people start to use Zebra to store their data, we expect users would like to run Hadoop streaming scripts to easily process Zebra tables. The following lists a simple example of using Hadoop streaming to access Zebra data. It loads data from foo table using Zebra's TableInputFormat and then writes the data into output using default TextOutputFormat. $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output output -mapper 'cat' -inputformat org.apache.hadoop.zebra.mapred.TableInputFormat More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its records. Currently, when Zebra's TableInputFormat is used for input, the user script sees each line containing key_if_any\tTuple.toString() . We plan to generate CSV format representation of our Pig tuples. To this end, we plan to do the following: 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override its toString() method to present the data into CSV format. 2) On Zebra side, the tuple factory should be changed to create ZebraTuple objects, instead of DefaultTuple objects. Note that we can only support streaming on the input side - ability to use streaming to read data from Zebra tables. For the output side, the streaming support is not feasible, since the streaming mapper or reducer only emits Text\tText, the output collector has no way of knowing how to convert this to (BytesWritable,Tuple). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.