[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index
[ https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749806#action_12749806 ] Hadoop QA commented on PIG-934: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418219/pig-934_2.patch against trunk revision 806668. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/console This message is automatically generated. Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index -- Key: PIG-934 URL: https://issues.apache.org/jira/browse/PIG-934 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Ashutosh Chauhan Attachments: pig-934_2.patch We use POLoad to seek into right file which has the following code: {noformat} public void setUp() throws IOException{ String filename = lFile.getFileName(); loader = (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec()); is = FileLocalizer.open(filename, pc); loader.bindTo(filename , new BufferedPositionedInputStream(is), this.offset, Long.MAX_VALUE); } {noformat} Between opening the stream and bindTo we do not seek to the right offset. bindTo itself does not perform any seek. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.
[ https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated PIG-939: --- Status: Patch Available (was: Open) Checkstyle pulls in junit3.7 which causes the build of test code to fail. - Key: PIG-939 URL: https://issues.apache.org/jira/browse/PIG-939 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.3.0 Reporter: Lee Tucker Attachments: pig-939.patch Pig fails to compile if you execute: ant -Dassociated flags for various components clean findbugs checkstyle test It gets the error: [javac] Compiling 153 source files to /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes [javac] /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31: cannot find symbol [javac] symbol : constructor TestCase() [javac] location: class junit.framework.TestCase [javac] public abstract class PigExecTestCase extends TestCase { [javac] ^ Once that's done, there's a copy of junit 3.7 cached from ivy that will continue to cause the build to fail. It will succeed, if you remove it, and then do: ant -Dassociated flags for various components clean findbugs test This proves it's running checkstyle that pulls in junit 3.7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.
[ https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated PIG-939: --- Attachment: pig-939.patch this patch should fix this issue of downloading junit-3.7 Checkstyle pulls in junit3.7 which causes the build of test code to fail. - Key: PIG-939 URL: https://issues.apache.org/jira/browse/PIG-939 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.3.0 Reporter: Lee Tucker Attachments: pig-939.patch Pig fails to compile if you execute: ant -Dassociated flags for various components clean findbugs checkstyle test It gets the error: [javac] Compiling 153 source files to /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes [javac] /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31: cannot find symbol [javac] symbol : constructor TestCase() [javac] location: class junit.framework.TestCase [javac] public abstract class PigExecTestCase extends TestCase { [javac] ^ Once that's done, there's a copy of junit 3.7 cached from ivy that will continue to cause the build to fail. It will succeed, if you remove it, and then do: ant -Dassociated flags for various components clean findbugs test This proves it's running checkstyle that pulls in junit 3.7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index
[ https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749901#action_12749901 ] Ashutosh Chauhan commented on PIG-934: -- All tests passed on my local box. Not sure why they failed on hudson. Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index -- Key: PIG-934 URL: https://issues.apache.org/jira/browse/PIG-934 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Ashutosh Chauhan Attachments: pig-934_2.patch We use POLoad to seek into right file which has the following code: {noformat} public void setUp() throws IOException{ String filename = lFile.getFileName(); loader = (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec()); is = FileLocalizer.open(filename, pc); loader.bindTo(filename , new BufferedPositionedInputStream(is), this.offset, Long.MAX_VALUE); } {noformat} Between opening the stream and bindTo we do not seek to the right offset. bindTo itself does not perform any seek. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749905#action_12749905 ] Mridul Muralidharan commented on PIG-940: - Is this supported in hadoop ? As in, can you specify the input to be on a different hdfs and get a mapred job to work ? IIRC no, but I could be missing something. If it is no, then not sure if pig can support it without an intermediate distcp ... Cross site HDFS access using the default.fs.name not possible in Pig Key: PIG-940 URL: https://issues.apache.org/jira/browse/PIG-940 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 20 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script which does the following.. access data from a remote HDFS location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I do not want to copy this huge amount of data between HDFS locations]]. However I want my Pigscript to write data to the HDFS running on localmachine.company.com. Currently Pig does not support that behavior and complains that: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist {code} A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); C = JOIN A by a, B by c; store C into 'output' using PigStorage(); {code} === 2009-09-01 00:37:24,032 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localmachine.company.com:8020 2009-09-01 00:37:24,277 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localmachine.company.com:50300 2009-09-01 00:37:24,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage-POForEach to POJoinPackage 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-01 00:37:26,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-01 00:37:26,249 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-01 00:37:26,747 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2009-09-01 00:37:26,756 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480 2009-09-01 00:37:26,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log === The error file in Pig contains: === ERROR 2998: Unhandled internal error. org.apache.pig.backend.executionengine.ExecException: ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at
[jira] Commented: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.
[ https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749921#action_12749921 ] Hadoop QA commented on PIG-939: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418232/pig-939.patch against trunk revision 806668. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/console This message is automatically generated. Checkstyle pulls in junit3.7 which causes the build of test code to fail. - Key: PIG-939 URL: https://issues.apache.org/jira/browse/PIG-939 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.3.0 Reporter: Lee Tucker Attachments: pig-939.patch Pig fails to compile if you execute: ant -Dassociated flags for various components clean findbugs checkstyle test It gets the error: [javac] Compiling 153 source files to /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes [javac] /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31: cannot find symbol [javac] symbol : constructor TestCase() [javac] location: class junit.framework.TestCase [javac] public abstract class PigExecTestCase extends TestCase { [javac] ^ Once that's done, there's a copy of junit 3.7 cached from ivy that will continue to cause the build to fail. It will succeed, if you remove it, and then do: ant -Dassociated flags for various components clean findbugs test This proves it's running checkstyle that pulls in junit 3.7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Request for feedback: cost-based optimizer
Hi everyone, Attached is a (very) preliminary document outlining a rough design we are proposing for a cost-based optimizer for Pig. This is being done as a capstone project by three CMU Master's students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not necessarily meant for immediate incorporation into the Pig codebase, although it would be nice if it, or parts of it, are found to be useful in the mainline. We would love to get some feedback from the developer community regarding the ideas expressed in the document, any concerns about the design, suggestions for improvement, etc. Thanks, Dmitriy, Ashutosh, Tejal
Re: Request for feedback: cost-based optimizer
Whoops :-) Here's the Google doc: http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en -Dmitriy On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo-inc.com wrote: Dmitriy and Gang, The mailing list does not allow attachments. Can you post it on a website and just send the URL ? Thanks, Santhosh -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Tuesday, September 01, 2009 9:48 AM To: pig-dev@hadoop.apache.org Subject: Request for feedback: cost-based optimizer Hi everyone, Attached is a (very) preliminary document outlining a rough design we are proposing for a cost-based optimizer for Pig. This is being done as a capstone project by three CMU Master's students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not necessarily meant for immediate incorporation into the Pig codebase, although it would be nice if it, or parts of it, are found to be useful in the mainline. We would love to get some feedback from the developer community regarding the ideas expressed in the document, any concerns about the design, suggestions for improvement, etc. Thanks, Dmitriy, Ashutosh, Tejal
[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750040#action_12750040 ] Koji Noguchi commented on PIG-940: -- bq. Is this supported in hadoop ? Sure. Cross site HDFS access using the default.fs.name not possible in Pig Key: PIG-940 URL: https://issues.apache.org/jira/browse/PIG-940 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 20 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script which does the following.. access data from a remote HDFS location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I do not want to copy this huge amount of data between HDFS locations]]. However I want my Pigscript to write data to the HDFS running on localmachine.company.com. Currently Pig does not support that behavior and complains that: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist {code} A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); C = JOIN A by a, B by c; store C into 'output' using PigStorage(); {code} === 2009-09-01 00:37:24,032 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localmachine.company.com:8020 2009-09-01 00:37:24,277 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localmachine.company.com:50300 2009-09-01 00:37:24,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage-POForEach to POJoinPackage 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-01 00:37:26,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-01 00:37:26,249 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-01 00:37:26,747 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2009-09-01 00:37:26,756 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480 2009-09-01 00:37:26,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log === The error file in Pig contains: === ERROR 2998: Unhandled internal error. org.apache.pig.backend.executionengine.ExecException: ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at
[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-918: - Attachment: pig-zebra.patch When you generate a patch with 'git diff' please use 'git diff --no-prefix' so that patch applies with 'patch -p0' command. I am updating the attached patch with this change. [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-918: - Affects Version/s: (was: 0.3.0) 0.4.0 [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750055#action_12750055 ] Raghu Angadi commented on PIG-918: -- I just committed this. Thanks Yan. [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-941) [zebra] Loading non-existing column generates error
[zebra] Loading non-existing column generates error --- Key: PIG-941 URL: https://issues.apache.org/jira/browse/PIG-941 Project: Pig Issue Type: Bug Components: data Reporter: Yiping Han Loading a column that does not exist generates the following error: 2009-09-01 21:29:15,161 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null Example is like this: STORE urls2 into '$output' using org.apache.pig.table.pig.TableStorer('md5:string, url:string'); and then in another pig script, I load the table: input = LOAD '$output' USING org.apache.pig.table.pig.TableLoader('md5,url, domain'); where domain is a column that does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750093#action_12750093 ] Jing Huang commented on PIG-833: Hi Yongqiang, Sorry for the late reply. I was out of town last week. Right, SF_F is not defined in the schema, query a none-existing column is allowed and it will return null. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Request for feedback: cost-based optimizer
I am still reading but one interesting question is why you decide to put CBO in physical layer? Dmitriy Ryaboy wrote: Whoops :-) Here's the Google doc: http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en -Dmitriy On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo-inc.com wrote: Dmitriy and Gang, The mailing list does not allow attachments. Can you post it on a website and just send the URL ? Thanks, Santhosh -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Tuesday, September 01, 2009 9:48 AM To: pig-dev@hadoop.apache.org Subject: Request for feedback: cost-based optimizer Hi everyone, Attached is a (very) preliminary document outlining a rough design we are proposing for a cost-based optimizer for Pig. This is being done as a capstone project by three CMU Master's students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not necessarily meant for immediate incorporation into the Pig codebase, although it would be nice if it, or parts of it, are found to be useful in the mainline. We would love to get some feedback from the developer community regarding the ideas expressed in the document, any concerns about the design, suggestions for improvement, etc. Thanks, Dmitriy, Ashutosh, Tejal
Re: Request for feedback: cost-based optimizer
Our initial survey of related literature showed that the usual place for a CBO tends to be between the physical and logical layer (in fact, the famous Cascades paper advocates removing the distinction between physical and logical operators altogether, and using an is_logical and is_physical flag instead -- meaning an operator can be one, both, or neither). The reasoning is that you cannot properly determine a cost of a plan if you don't know the physical properties of the operators that implement it. An optimizer that works at a logical layer would by definition create the same plan whether in local or mapreduce mode (since such differences are abstracted from it). This is clearly incorrect, as the properties of the environment in which these plans are executed are drastically different. Working at the physical layer lets us stay close to the iron and adjust based on the specifics of the execution environment. Certainly one can posit a framework for a CBO that would set up the necessary interfaces and plumbing for optimizing in any execution mode, and invoke the proper implementations at run time; we are not discounting that possibility (haven't gotten quite that far in the design, to be honest). But we feel that the implementations have to be execution mode specific. -Dmitriy On Tue, Sep 1, 2009 at 6:26 PM, Jianyong Daijiany...@yahoo-inc.com wrote: I am still reading but one interesting question is why you decide to put CBO in physical layer? Dmitriy Ryaboy wrote: Whoops :-) Here's the Google doc: http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en -Dmitriy On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo-inc.com wrote: Dmitriy and Gang, The mailing list does not allow attachments. Can you post it on a website and just send the URL ? Thanks, Santhosh -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Tuesday, September 01, 2009 9:48 AM To: pig-dev@hadoop.apache.org Subject: Request for feedback: cost-based optimizer Hi everyone, Attached is a (very) preliminary document outlining a rough design we are proposing for a cost-based optimizer for Pig. This is being done as a capstone project by three CMU Master's students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not necessarily meant for immediate incorporation into the Pig codebase, although it would be nice if it, or parts of it, are found to be useful in the mainline. We would love to get some feedback from the developer community regarding the ideas expressed in the document, any concerns about the design, suggestions for improvement, etc. Thanks, Dmitriy, Ashutosh, Tejal
[jira] Updated: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index
[ https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-934: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Checked that the unit tests works locally on my machine too. Patch committed - Thanks Ashutosh! Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index -- Key: PIG-934 URL: https://issues.apache.org/jira/browse/PIG-934 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Ashutosh Chauhan Attachments: pig-934_2.patch We use POLoad to seek into right file which has the following code: {noformat} public void setUp() throws IOException{ String filename = lFile.getFileName(); loader = (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec()); is = FileLocalizer.open(filename, pc); loader.bindTo(filename , new BufferedPositionedInputStream(is), this.offset, Long.MAX_VALUE); } {noformat} Between opening the stream and bindTo we do not seek to the right offset. bindTo itself does not perform any seek. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-935: Attachment: skmapbug.patch Added code to explicitly check for -1 in orderby Skewed join throws an exception when used with map keys --- Key: PIG-935 URL: https://issues.apache.org/jira/browse/PIG-935 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Attachments: skmapbug.patch Skewed join throws a runtime exception for the following query: A = load 'map.txt' as (e); B = load 'map.txt' as (f); C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed; explain C; Exception: Caused by: java.lang.ClassCastException: org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast cannot be cast to org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO Project at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894) ... 27 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-935: Attachment: (was: skjoinmapbug.patch) Skewed join throws an exception when used with map keys --- Key: PIG-935 URL: https://issues.apache.org/jira/browse/PIG-935 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Attachments: skmapbug.patch Skewed join throws a runtime exception for the following query: A = load 'map.txt' as (e); B = load 'map.txt' as (f); C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed; explain C; Exception: Caused by: java.lang.ClassCastException: org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast cannot be cast to org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO Project at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894) ... 27 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-935: Status: Patch Available (was: Open) Skewed join throws an exception when used with map keys --- Key: PIG-935 URL: https://issues.apache.org/jira/browse/PIG-935 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Attachments: skmapbug.patch Skewed join throws a runtime exception for the following query: A = load 'map.txt' as (e); B = load 'map.txt' as (f); C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed; explain C; Exception: Caused by: java.lang.ClassCastException: org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast cannot be cast to org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO Project at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894) ... 27 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-935: Status: Open (was: Patch Available) Skewed join throws an exception when used with map keys --- Key: PIG-935 URL: https://issues.apache.org/jira/browse/PIG-935 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Attachments: skjoinmapbug.patch Skewed join throws a runtime exception for the following query: A = load 'map.txt' as (e); B = load 'map.txt' as (f); C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed; explain C; Exception: Caused by: java.lang.ClassCastException: org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast cannot be cast to org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO Project at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894) ... 27 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-935) Skewed join throws an exception when used with map keys
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750256#action_12750256 ] Hadoop QA commented on PIG-935: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418325/skmapbug.patch against trunk revision 810327. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/console This message is automatically generated. Skewed join throws an exception when used with map keys --- Key: PIG-935 URL: https://issues.apache.org/jira/browse/PIG-935 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Attachments: skmapbug.patch Skewed join throws a runtime exception for the following query: A = load 'map.txt' as (e); B = load 'map.txt' as (f); C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed; explain C; Exception: Caused by: java.lang.ClassCastException: org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast cannot be cast to org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO Project at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894) ... 27 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.