[jira] Commented: (PIG-765) to implement jdiff
[ https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752954#action_12752954 ] Hadoop QA commented on PIG-765: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419017/pig-765.patch against trunk revision 812599. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 268 release audit warnings (more than the trunk's current 162 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/19/console This message is automatically generated. to implement jdiff -- Key: PIG-765 URL: https://issues.apache.org/jira/browse/PIG-765 Project: Pig Issue Type: Improvement Components: build Reporter: Giridharan Kesavan Assignee: Giridharan Kesavan Attachments: pig-765.patch, pig-765.patch, pig-765.patch, pig-765.patch, pig-765.patch, pig-765.patch, pig-765.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: questions about integration of pig and HBase
Thank you for the link. Anyway, what I was looking for is an example of PIG syntax loading from a HBase table, is it something like: queries = LOAD 'HBase Table USING HBaseStorage() ? Jeff Zhang a écrit : Using HBaseStorage as your loadFunc, it uses a customer slicer HBaseSlice You can refer this link for more information http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer 2009/9/9 Vincent BARAT vincent.ba...@ubikod.com Alan Gates a écrit : Pig supports reading from Hbase (in Hadoop/Hbase 0.18 only). Hello, Do you have any link to the documentation about how to do that? I can't find any example... Thanks,
Re: questions about integration of pig and HBase
See the JIRA PIG-6. See also the HbaseStorage unit test that tests the functionality. Alan. On Sep 9, 2009, at 5:31 AM, Vincent BARAT wrote: Thank you for the link. Anyway, what I was looking for is an example of PIG syntax loading from a HBase table, is it something like: queries = LOAD 'HBase Table USING HBaseStorage() ? Jeff Zhang a écrit : Using HBaseStorage as your loadFunc, it uses a customer slicer HBaseSlice You can refer this link for more information http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer 2009/9/9 Vincent BARAT vincent.ba...@ubikod.com Alan Gates a écrit : Pig supports reading from Hbase (in Hadoop/Hbase 0.18 only). Hello, Do you have any link to the documentation about how to do that? I can't find any example... Thanks,
[jira] Commented: (PIG-948) [Usability] Relating pig script with MR jobs
[ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753173#action_12753173 ] Daniel Dai commented on PIG-948: One thing I am not sure is the way you interpolate the job tracker url {code} http://+ jobTrackerAdd+port+/jobdetails.jsp?jobid=+job.getAssignedJobID(); {code} I am not sure if we shall have this logic in pig, looks hacky to me. Other part is good. [Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: pig-948.patch Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-948) [Usability] Relating pig script with MR jobs
[ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753260#action_12753260 ] Ashutosh Chauhan commented on PIG-948: -- In this string, we are determining job-tracker address, port number and job-ids through apis, so thats fine. I agree that hardcoding other parts of url ( jobdetails.jsp?jobid= ) is not the best way to do it, as it will break the link if that web-url changes in later hadoop releases. But since there is no way to programatically get that url, I went ahead with this. If there is a way to get that url programatically, let me know. If not, I think its useful enough to have it like this and update it if it gets changed in later hadoop releases. [Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: pig-948.patch Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-938) Pig Docs for 0.4.0
[ https://issues.apache.org/jira/browse/PIG-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-938: Attachment: PIG-938-2.patch Patch #2 - includes OUTER JOIN write up. Pig Docs for 0.4.0 -- Key: PIG-938 URL: https://issues.apache.org/jira/browse/PIG-938 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.4.0 Reporter: Corinne Chandel Priority: Minor Attachments: PIG-938-2.patch, PIG-938.patch Pig docs for 0.4.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-938) Pig Docs for 0.4.0
[ https://issues.apache.org/jira/browse/PIG-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-938: --- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed. Thanks, Corinne! Pig Docs for 0.4.0 -- Key: PIG-938 URL: https://issues.apache.org/jira/browse/PIG-938 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.4.0 Reporter: Corinne Chandel Priority: Minor Attachments: PIG-938-2.patch, PIG-938-2b.patch, PIG-938.patch Pig docs for 0.4.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Preparing to branch for Pig 0.4.0 release
Hi, I am updating the tree to make it ready for a branch for the release. Please, hold off any commits till this is done. I will send an email once the branch is created. Thanks, Olga
[jira] Commented: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct
[ https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753339#action_12753339 ] Yan Zhou commented on PIG-944: -- The previously attached patch is based upon some new features under development and consequently might not be applicable to the trunk. I'm going to attahc another patch shortly based upon version 1 branch. In addition to the problem in TableOutputFormat.checkOutputSpecs, the SchemaConverter.toPigSchema did not convert the Zebra schema to Pig's properly if nested types are involved: the low level column schemas were simply missing. Also, the conversion from Pig to Zebra schema is just missing beyond a hack to work on specially prefixed column names. Zebra schema is taken from Pig through TableStorer's construct -- Key: PIG-944 URL: https://issues.apache.org/jira/browse/PIG-944 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Attachments: zebra_pig_interface.patch It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method because the information is dynamic in Pig's execution engine and should not be taking a static argument to the constructor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct
[ https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-944: - Attachment: zebra_pig_interface_1_1.patch Zebra schema is taken from Pig through TableStorer's construct -- Key: PIG-944 URL: https://issues.apache.org/jira/browse/PIG-944 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Attachments: zebra_pig_interface.patch, zebra_pig_interface_1_1.patch It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method because the information is dynamic in Pig's execution engine and should not be taking a static argument to the constructor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-950) Pig Loader does not handle unix hidden files ( files starting with dot)
[ https://issues.apache.org/jira/browse/PIG-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753361#action_12753361 ] Daniel Dai commented on PIG-950: I tried, actually Hadoop will ignore files start with . while processing a map-reduce job. So guess we can do nothing, just not name the file starts with .. Pig Loader does not handle unix hidden files ( files starting with dot) --- Key: PIG-950 URL: https://issues.apache.org/jira/browse/PIG-950 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Jing Huang I am trying to load .btschema file using pig loader, ( .btschema is not an empty file) This is what I did: grunt a = load '.btschema'; grunt dump a; 2009-09-09 17:41:21,170 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-09 17:41:21,170 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-09 17:41:23,092 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-09 17:41:23,106 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2009-09-09 17:41:23,127 [Thread-4] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-09 17:41:23,623 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-09 17:41:28,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-09 17:41:28,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Successfully stored result in: file:/tmp/temp165972/tmp-527102439 2009-09-09 17:41:28,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : 0 2009-09-09 17:41:28,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : 0 2009-09-09 17:41:28,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! grunt = it dumps nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Preparing to branch for Pig 0.4.0 release
I am having some problems with the docs that I will need to resolve tomorrow. I would like to keep the tree closed till then. If you absolutely need to make a checkin, please, go ahead and I will integrate your patch into the branch. Thanks, Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Wednesday, September 09, 2009 3:31 PM To: pig-dev@hadoop.apache.org Subject: Preparing to branch for Pig 0.4.0 release Hi, I am updating the tree to make it ready for a branch for the release. Please, hold off any commits till this is done. I will send an email once the branch is created. Thanks, Olga
[jira] Commented: (PIG-950) Pig Loader does not handle unix hidden files ( files starting with dot)
[ https://issues.apache.org/jira/browse/PIG-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753368#action_12753368 ] Daniel Dai commented on PIG-950: Here is the experiment I tried: hadoop fs -ls gutenberg {code} /user/jianyong/gutenberg/.2.txt /user/jianyong/gutenberg/1.txt {code} hadoop fs -cat /gutenberg/1.txt {code} hello {code} hadoop fs -cat /gutenberg/.2.txt {code} daniel {code} hadoop jar hadoop-0.18.1-examples.jar wordcount gutenberg gutenberg-output hadoop fs -cat gutenberg-output/part-0 {code} hello 1 {code} Pig Loader does not handle unix hidden files ( files starting with dot) --- Key: PIG-950 URL: https://issues.apache.org/jira/browse/PIG-950 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Jing Huang I am trying to load .btschema file using pig loader, ( .btschema is not an empty file) This is what I did: grunt a = load '.btschema'; grunt dump a; 2009-09-09 17:41:21,170 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-09 17:41:21,170 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-09 17:41:23,092 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-09 17:41:23,106 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2009-09-09 17:41:23,127 [Thread-4] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-09 17:41:23,623 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-09 17:41:28,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-09 17:41:28,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Successfully stored result in: file:/tmp/temp165972/tmp-527102439 2009-09-09 17:41:28,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : 0 2009-09-09 17:41:28,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : 0 2009-09-09 17:41:28,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! grunt = it dumps nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
Reset parallelism to 1 for indexing job in MergeJoin Key: PIG-951 URL: https://issues.apache.org/jira/browse/PIG-951 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan After sampling one tuple from every block, one reducer is used to sort the index entries in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index job should be explictly set to 1. Currently, its not. Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before merge-join. However, later when we do allow blocking operators, then parallelism of indexing job will be that of preceding blocking operator. Even then, job will complete successfully because all tuple will go to only one reducer, because we are grouping on only one key all. However, it will waste cluster resources by starting all the extra reducers which get no data and thus do nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
[ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-951: - Attachment: pig-951.patch One line patch which fixes this. Also, added test case to catch regression on this. Reset parallelism to 1 for indexing job in MergeJoin Key: PIG-951 URL: https://issues.apache.org/jira/browse/PIG-951 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-951.patch After sampling one tuple from every block, one reducer is used to sort the index entries in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index job should be explictly set to 1. Currently, its not. Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before merge-join. However, later when we do allow blocking operators, then parallelism of indexing job will be that of preceding blocking operator. Even then, job will complete successfully because all tuple will go to only one reducer, because we are grouping on only one key all. However, it will waste cluster resources by starting all the extra reducers which get no data and thus do nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-948) [Usability] Relating pig script with MR jobs
[ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-948: - Status: Patch Available (was: Open) [Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: pig-948.patch Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.