[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749806#action_12749806
 ] 

Hadoop QA commented on PIG-934:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418219/pig-934_2.patch
  against trunk revision 806668.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/console

This message is automatically generated.

 Merge join implementation currently does not seek to right point on the right 
 side input based on the offset provided by the index
 --

 Key: PIG-934
 URL: https://issues.apache.org/jira/browse/PIG-934
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Ashutosh Chauhan
 Attachments: pig-934_2.patch


 We use POLoad to seek into right file which has the following code: 
 {noformat}
public void setUp() throws IOException{
 String filename = lFile.getFileName();
 loader = 
 (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
 is = FileLocalizer.open(filename, pc);
 loader.bindTo(filename , new BufferedPositionedInputStream(is), 
 this.offset, Long.MAX_VALUE);
 }
 {noformat}
 Between opening the stream and bindTo we do not seek to the right offset. 
 bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-01 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-939:
---

Status: Patch Available  (was: Open)

 Checkstyle pulls in junit3.7 which causes the build of test code to fail.
 -

 Key: PIG-939
 URL: https://issues.apache.org/jira/browse/PIG-939
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
Reporter: Lee Tucker
 Attachments: pig-939.patch


 Pig fails to compile if you execute: 
 ant -Dassociated flags for various components clean findbugs checkstyle 
 test 
 It gets the error:
 [javac] Compiling 153 source files to 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
 [javac] 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
  cannot find symbol
 [javac] symbol  : constructor TestCase()
 [javac] location: class junit.framework.TestCase
 [javac] public abstract class PigExecTestCase extends TestCase {
 [javac] ^
 Once that's done, there's a copy of junit 3.7 cached from ivy that will 
 continue to cause the build to fail.  It will succeed, if you remove it, and 
 then do:
 ant -Dassociated flags for various components clean findbugs test
 This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-01 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-939:
---

Attachment: pig-939.patch

this patch should fix this issue of downloading junit-3.7

 Checkstyle pulls in junit3.7 which causes the build of test code to fail.
 -

 Key: PIG-939
 URL: https://issues.apache.org/jira/browse/PIG-939
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
Reporter: Lee Tucker
 Attachments: pig-939.patch


 Pig fails to compile if you execute: 
 ant -Dassociated flags for various components clean findbugs checkstyle 
 test 
 It gets the error:
 [javac] Compiling 153 source files to 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
 [javac] 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
  cannot find symbol
 [javac] symbol  : constructor TestCase()
 [javac] location: class junit.framework.TestCase
 [javac] public abstract class PigExecTestCase extends TestCase {
 [javac] ^
 Once that's done, there's a copy of junit 3.7 cached from ivy that will 
 continue to cause the build to fail.  It will succeed, if you remove it, and 
 then do:
 ant -Dassociated flags for various components clean findbugs test
 This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749901#action_12749901
 ] 

Ashutosh Chauhan commented on PIG-934:
--

All tests passed on my local box. Not sure why they failed on hudson. 

 Merge join implementation currently does not seek to right point on the right 
 side input based on the offset provided by the index
 --

 Key: PIG-934
 URL: https://issues.apache.org/jira/browse/PIG-934
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Ashutosh Chauhan
 Attachments: pig-934_2.patch


 We use POLoad to seek into right file which has the following code: 
 {noformat}
public void setUp() throws IOException{
 String filename = lFile.getFileName();
 loader = 
 (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
 is = FileLocalizer.open(filename, pc);
 loader.bindTo(filename , new BufferedPositionedInputStream(is), 
 this.offset, Long.MAX_VALUE);
 }
 {noformat}
 Between opening the stream and bindTo we do not seek to the right offset. 
 bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-09-01 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749905#action_12749905
 ] 

Mridul Muralidharan commented on PIG-940:
-

Is this supported in hadoop ? As in, can you specify the input to be on a 
different hdfs and get a mapred job to work ? IIRC no, but I could be missing 
something.

If it is no, then not sure if pig can support it without an intermediate distcp 
...

 Cross site HDFS access using the default.fs.name not possible in Pig
 

 Key: PIG-940
 URL: https://issues.apache.org/jira/browse/PIG-940
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.3.0


 I have a script which does the following.. access data from a remote HDFS 
 location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
 do not want to copy this huge amount of data between HDFS locations]].
 However I want my Pigscript  to write data to the HDFS running on 
 localmachine.company.com.
 Currently Pig does not support that behavior and complains that: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist
 {code}
 A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
 B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
 C = JOIN A by a, B by c; 
 store C into 'output' using PigStorage();  
 {code}
 ===
 2009-09-01 00:37:24,032 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localmachine.company.com:8020
 2009-09-01 00:37:24,277 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localmachine.company.com:50300
 2009-09-01 00:37:24,567 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
  - Rewrite: POPackage-POForEach to POJoinPackage
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-01 00:37:26,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-01 00:37:26,747 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2009-09-01 00:37:26,756 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: 
 hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480
 2009-09-01 00:37:26,756 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
 ===
 The error file in Pig contains:
 ===
 ERROR 2998: Unhandled internal error. 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at 
 

[jira] Commented: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749921#action_12749921
 ] 

Hadoop QA commented on PIG-939:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418232/pig-939.patch
  against trunk revision 806668.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/console

This message is automatically generated.

 Checkstyle pulls in junit3.7 which causes the build of test code to fail.
 -

 Key: PIG-939
 URL: https://issues.apache.org/jira/browse/PIG-939
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
Reporter: Lee Tucker
 Attachments: pig-939.patch


 Pig fails to compile if you execute: 
 ant -Dassociated flags for various components clean findbugs checkstyle 
 test 
 It gets the error:
 [javac] Compiling 153 source files to 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
 [javac] 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
  cannot find symbol
 [javac] symbol  : constructor TestCase()
 [javac] location: class junit.framework.TestCase
 [javac] public abstract class PigExecTestCase extends TestCase {
 [javac] ^
 Once that's done, there's a copy of junit 3.7 cached from ivy that will 
 continue to cause the build to fail.  It will succeed, if you remove it, and 
 then do:
 ant -Dassociated flags for various components clean findbugs test
 This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's
students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is
not necessarily meant for immediate incorporation into the Pig
codebase, although it would be nice if it, or parts of it, are found
to be useful in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal


Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Whoops :-)
Here's the Google doc:
http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en

-Dmitriy

On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo-inc.com wrote:
 Dmitriy and Gang,

 The mailing list does not allow attachments. Can you post it on a
 website and just send the URL ?

 Thanks,
 Santhosh

 -Original Message-
 From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
 Sent: Tuesday, September 01, 2009 9:48 AM
 To: pig-dev@hadoop.apache.org
 Subject: Request for feedback: cost-based optimizer

 Hi everyone,
 Attached is a (very) preliminary document outlining a rough design we
 are proposing for a cost-based optimizer for Pig.
 This is being done as a capstone project by three CMU Master's students
 (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
 necessarily meant for immediate incorporation into the Pig codebase,
 although it would be nice if it, or parts of it, are found to be useful
 in the mainline.

 We would love to get some feedback from the developer community
 regarding the ideas expressed in the document, any concerns about the
 design, suggestions for improvement, etc.

 Thanks,
 Dmitriy, Ashutosh, Tejal



[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-09-01 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750040#action_12750040
 ] 

Koji Noguchi commented on PIG-940:
--

bq. Is this supported in hadoop ? 
Sure.

 Cross site HDFS access using the default.fs.name not possible in Pig
 

 Key: PIG-940
 URL: https://issues.apache.org/jira/browse/PIG-940
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.3.0


 I have a script which does the following.. access data from a remote HDFS 
 location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
 do not want to copy this huge amount of data between HDFS locations]].
 However I want my Pigscript  to write data to the HDFS running on 
 localmachine.company.com.
 Currently Pig does not support that behavior and complains that: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist
 {code}
 A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
 B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
 C = JOIN A by a, B by c; 
 store C into 'output' using PigStorage();  
 {code}
 ===
 2009-09-01 00:37:24,032 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localmachine.company.com:8020
 2009-09-01 00:37:24,277 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localmachine.company.com:50300
 2009-09-01 00:37:24,567 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
  - Rewrite: POPackage-POForEach to POJoinPackage
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-01 00:37:26,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-01 00:37:26,747 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2009-09-01 00:37:26,756 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: 
 hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480
 2009-09-01 00:37:26,756 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
 ===
 The error file in Pig contains:
 ===
 ERROR 2998: Unhandled internal error. 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
 at 
 

[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Attachment: pig-zebra.patch

When you generate a patch with 'git diff' please use 'git diff --no-prefix' so 
that patch applies with 'patch -p0' command. I am updating the attached patch 
with this change.


 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Affects Version/s: (was: 0.3.0)
   0.4.0

 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750055#action_12750055
 ] 

Raghu Angadi commented on PIG-918:
--

I just committed this. Thanks Yan.

 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-941) [zebra] Loading non-existing column generates error

2009-09-01 Thread Yiping Han (JIRA)
[zebra] Loading non-existing column generates error
---

 Key: PIG-941
 URL: https://issues.apache.org/jira/browse/PIG-941
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Yiping Han


Loading a column that does not exist generates the following error:

2009-09-01 21:29:15,161 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2999: Unexpected internal error. null

Example is like this:

STORE urls2 into '$output' using 
org.apache.pig.table.pig.TableStorer('md5:string, url:string');

and then in another pig script, I load the table:

input = LOAD '$output' USING org.apache.pig.table.pig.TableLoader('md5,url, 
domain');

where domain is a column that does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-09-01 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750093#action_12750093
 ] 

Jing Huang commented on PIG-833:


Hi Yongqiang, 
Sorry for the late reply. I was out of town last week. 
Right, SF_F is not defined in the schema, query a none-existing column is 
allowed and it will return null.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Jianyong Dai
I am still reading but one interesting question is why you decide to put 
CBO in physical layer?


Dmitriy Ryaboy wrote:

Whoops :-)
Here's the Google doc:
http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en

-Dmitriy

On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo-inc.com wrote:
  

Dmitriy and Gang,

The mailing list does not allow attachments. Can you post it on a
website and just send the URL ?

Thanks,
Santhosh

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Tuesday, September 01, 2009 9:48 AM
To: pig-dev@hadoop.apache.org
Subject: Request for feedback: cost-based optimizer

Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's students
(myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
necessarily meant for immediate incorporation into the Pig codebase,
although it would be nice if it, or parts of it, are found to be useful
in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal






Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Our initial survey of related literature showed that the usual place
for a CBO tends to be between the physical and logical layer (in fact,
the famous Cascades paper advocates removing the distinction between
physical and logical operators altogether, and using an is_logical
and is_physical flag instead -- meaning an operator can be one,
both, or neither).

The reasoning is that you cannot properly determine a cost of a plan
if you don't know the physical properties of the operators that
implement it. An optimizer that works at a logical layer would by
definition create the same plan whether in local or mapreduce mode
(since such differences are abstracted from it). This is clearly
incorrect, as the properties of the environment in which these plans
are executed are drastically different.  Working at the physical layer
lets us stay close to the iron and adjust based on the specifics of
the execution environment.

Certainly one can posit a framework for a CBO that would set up the
necessary interfaces and plumbing for optimizing in any execution
mode, and invoke the proper implementations at run time; we are not
discounting that possibility (haven't gotten quite that far in the
design, to be honest).  But we feel that the implementations have to
be execution mode specific.

-Dmitriy

On Tue, Sep 1, 2009 at 6:26 PM, Jianyong Daijiany...@yahoo-inc.com wrote:
 I am still reading but one interesting question is why you decide to put CBO
 in physical layer?

 Dmitriy Ryaboy wrote:

 Whoops :-)
 Here's the Google doc:

 http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en

 -Dmitriy

 On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo-inc.com
 wrote:


 Dmitriy and Gang,

 The mailing list does not allow attachments. Can you post it on a
 website and just send the URL ?

 Thanks,
 Santhosh

 -Original Message-
 From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
 Sent: Tuesday, September 01, 2009 9:48 AM
 To: pig-dev@hadoop.apache.org
 Subject: Request for feedback: cost-based optimizer

 Hi everyone,
 Attached is a (very) preliminary document outlining a rough design we
 are proposing for a cost-based optimizer for Pig.
 This is being done as a capstone project by three CMU Master's students
 (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
 necessarily meant for immediate incorporation into the Pig codebase,
 although it would be nice if it, or parts of it, are found to be useful
 in the mainline.

 We would love to get some feedback from the developer community
 regarding the ideas expressed in the document, any concerns about the
 design, suggestions for improvement, etc.

 Thanks,
 Dmitriy, Ashutosh, Tejal






[jira] Updated: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-934:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Checked that the unit tests works locally on my machine too.

Patch committed - Thanks Ashutosh!

 Merge join implementation currently does not seek to right point on the right 
 side input based on the offset provided by the index
 --

 Key: PIG-934
 URL: https://issues.apache.org/jira/browse/PIG-934
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.1
Reporter: Pradeep Kamath
Assignee: Ashutosh Chauhan
 Attachments: pig-934_2.patch


 We use POLoad to seek into right file which has the following code: 
 {noformat}
public void setUp() throws IOException{
 String filename = lFile.getFileName();
 loader = 
 (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
 is = FileLocalizer.open(filename, pc);
 loader.bindTo(filename , new BufferedPositionedInputStream(is), 
 this.offset, Long.MAX_VALUE);
 }
 {noformat}
 Between opening the stream and bindTo we do not seek to the right offset. 
 bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Attachment: skmapbug.patch

Added code to explicitly check for -1 in orderby

 Skewed join throws an exception when used with map keys
 ---

 Key: PIG-935
 URL: https://issues.apache.org/jira/browse/PIG-935
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skmapbug.patch


 Skewed join throws a runtime exception for the following query:
 A = load 'map.txt' as (e);
 B = load 'map.txt' as (f);
 C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed;
 explain C;
 Exception:
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
 Project
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
 ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Attachment: (was: skjoinmapbug.patch)

 Skewed join throws an exception when used with map keys
 ---

 Key: PIG-935
 URL: https://issues.apache.org/jira/browse/PIG-935
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skmapbug.patch


 Skewed join throws a runtime exception for the following query:
 A = load 'map.txt' as (e);
 B = load 'map.txt' as (f);
 C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed;
 explain C;
 Exception:
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
 Project
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
 ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Status: Patch Available  (was: Open)

 Skewed join throws an exception when used with map keys
 ---

 Key: PIG-935
 URL: https://issues.apache.org/jira/browse/PIG-935
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skmapbug.patch


 Skewed join throws a runtime exception for the following query:
 A = load 'map.txt' as (e);
 B = load 'map.txt' as (f);
 C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed;
 explain C;
 Exception:
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
 Project
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
 ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Status: Open  (was: Patch Available)

 Skewed join throws an exception when used with map keys
 ---

 Key: PIG-935
 URL: https://issues.apache.org/jira/browse/PIG-935
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skjoinmapbug.patch


 Skewed join throws a runtime exception for the following query:
 A = load 'map.txt' as (e);
 B = load 'map.txt' as (f);
 C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed;
 explain C;
 Exception:
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
 Project
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
 ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750256#action_12750256
 ] 

Hadoop QA commented on PIG-935:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418325/skmapbug.patch
  against trunk revision 810327.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/console

This message is automatically generated.

 Skewed join throws an exception when used with map keys
 ---

 Key: PIG-935
 URL: https://issues.apache.org/jira/browse/PIG-935
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skmapbug.patch


 Skewed join throws a runtime exception for the following query:
 A = load 'map.txt' as (e);
 B = load 'map.txt' as (f);
 C = join A by (chararray)e#'a', B by (chararray)f#'a' using skewed;
 explain C;
 Exception:
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
 Project
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
 ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.