[jira] Updated: (PIG-1336) Optimize POStore serialized into JobConf

2010-04-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1336:


Description: 
We serialize POStore too early in the JobControlCompiler. At that time, 
storeFunc have unconstraint link to other operator; in the worst case, it will 
chain the whole physical plan. Also, in multi-store case, POStore has link to 
its data source, which is not needed and will increase the footprint of 
serialized POStore. 

Worse, it may cause problem if we do not optimize POStore. If we have two 
map-reduce job, the first job need a LoadFunc from an external jar. The first 
job will ship the jar to backend but the second job will not. However, since 
POStore of second job has a link chain to the LoadFunc of the first job, to 
deserialize it, we need that external jar. Since we do not ship the external 
jar for the second map-reduce job, we die in this case. So it is more than an 
optimization, it is also a bug fix. 

  was:We serialize POStore too early in the JobControlCompiler. At that time, 
storeFunc have unconstraint link to other operator; in the worst case, it will 
chain the whole physical plan. Also, in multi-store case, POStore has link to 
its data source, which is not needed and will increase the footprint of 
serialized POStore. 


 Optimize POStore serialized into JobConf
 

 Key: PIG-1336
 URL: https://issues.apache.org/jira/browse/PIG-1336
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1336-1.patch, PIG-1336-2.patch, PIG-1336-3.patch, 
 PIG-1336-4.patch


 We serialize POStore too early in the JobControlCompiler. At that time, 
 storeFunc have unconstraint link to other operator; in the worst case, it 
 will chain the whole physical plan. Also, in multi-store case, POStore has 
 link to its data source, which is not needed and will increase the footprint 
 of serialized POStore. 
 Worse, it may cause problem if we do not optimize POStore. If we have two 
 map-reduce job, the first job need a LoadFunc from an external jar. The first 
 job will ship the jar to backend but the second job will not. However, since 
 POStore of second job has a link chain to the LoadFunc of the first job, to 
 deserialize it, we need that external jar. Since we do not ship the external 
 jar for the second map-reduce job, we die in this case. So it is more than an 
 optimization, it is also a bug fix. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (PIG-1378) har url not usable in Pig scripts

2010-04-14 Thread Viraj Bhat (JIRA)
har url not usable in Pig scripts
-

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.7.0


I am trying to use har (Hadoop Archives) in my Pig script.

I can use them through the HDFS shell
{noformat}
$hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
Found 1 items
-rw---   5 viraj users1537234 2010-04-14 09:49 
user/viraj/project/subproject/files/size/data/part-1
{noformat}

Using similar URL's in grunt yields
{noformat}
grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
grunt dump a;
{noformat}


{noformat}
2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2998: Unhandled internal error. 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file 
URI scheme: har : hdfs
2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
is no log file to write to.
2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
Incompatible file URI scheme: har : hdfs
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
at 
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
Incompatible file URI scheme: har : hdfs
at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
... 13 more
{noformat}

According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
following as stated in the original description

{noformat}
grunt a = load 
'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
grunt dump a;
{noformat}

{noformat}
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
Unable to create input splits for: 
har://namenode-location/user/viraj/project/subproject/files/size/data'; 
... 8 more
Caused by: java.io.IOException: No FileSystem for scheme: mithrilgold
at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at 
.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at 
.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
{noformat}

Viraj

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-14 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-1378:


Description: 
I am trying to use har (Hadoop Archives) in my Pig script.

I can use them through the HDFS shell
{noformat}
$hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
Found 1 items
-rw---   5 viraj users1537234 2010-04-14 09:49 
user/viraj/project/subproject/files/size/data/part-1
{noformat}

Using similar URL's in grunt yields
{noformat}
grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
grunt dump a;
{noformat}


{noformat}
2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2998: Unhandled internal error. 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file 
URI scheme: har : hdfs
2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
is no log file to write to.
2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
Incompatible file URI scheme: har : hdfs
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
at 
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
Incompatible file URI scheme: har : hdfs
at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
... 13 more
{noformat}

According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
following as stated in the original description

{noformat}
grunt a = load 
'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
grunt dump a;
{noformat}

{noformat}
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
Unable to create input splits for: 
har://namenode-location/user/viraj/project/subproject/files/size/data'; 
... 8 more
Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at 
.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at 
.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
{noformat}

Viraj

  was:
I am trying to use har (Hadoop Archives) in my Pig script.

I can use them through the HDFS shell
{noformat}
$hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
Found 1 items
-rw---   5 viraj users1537234 2010-04-14 09:49 
user/viraj/project/subproject/files/size/data/part-1
{noformat}

Using similar URL's in grunt yields
{noformat}
grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
grunt dump a;
{noformat}


{noformat}
2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2998: Unhandled internal error. 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 

[jira] Commented: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2010-04-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857132#action_12857132
 ] 

Hadoop QA commented on PIG-939:
---

To test jira cli

 Checkstyle pulls in junit3.7 which causes the build of test code to fail.
 -

 Key: PIG-939
 URL: https://issues.apache.org/jira/browse/PIG-939
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
Reporter: Lee Tucker
Assignee: Giridharan Kesavan
 Fix For: 0.4.0

 Attachments: pig-939.patch


 Pig fails to compile if you execute: 
 ant -Dassociated flags for various components clean findbugs checkstyle 
 test 
 It gets the error:
 [javac] Compiling 153 source files to 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
 [javac] 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
  cannot find symbol
 [javac] symbol  : constructor TestCase()
 [javac] location: class junit.framework.TestCase
 [javac] public abstract class PigExecTestCase extends TestCase {
 [javac] ^
 Once that's done, there's a copy of junit 3.7 cached from ivy that will 
 continue to cause the build to fail.  It will succeed, if you remove it, and 
 then do:
 ant -Dassociated flags for various components clean findbugs test
 This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2010-04-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857136#action_12857136
 ] 

Hadoop QA commented on PIG-939:
---

To test jira cli

 Checkstyle pulls in junit3.7 which causes the build of test code to fail.
 -

 Key: PIG-939
 URL: https://issues.apache.org/jira/browse/PIG-939
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
Reporter: Lee Tucker
Assignee: Giridharan Kesavan
 Fix For: 0.4.0

 Attachments: pig-939.patch


 Pig fails to compile if you execute: 
 ant -Dassociated flags for various components clean findbugs checkstyle 
 test 
 It gets the error:
 [javac] Compiling 153 source files to 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
 [javac] 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
  cannot find symbol
 [javac] symbol  : constructor TestCase()
 [javac] location: class junit.framework.TestCase
 [javac] public abstract class PigExecTestCase extends TestCase {
 [javac] ^
 Once that's done, there's a copy of junit 3.7 cached from ivy that will 
 continue to cause the build to fail.  It will succeed, if you remove it, and 
 then do:
 ant -Dassociated flags for various components clean findbugs test
 This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

2010-04-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857137#action_12857137
 ] 

Olga Natkovich commented on PIG-1372:
-

+1

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1363) Unnecessary loadFunc instantiations

2010-04-14 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857139#action_12857139
 ] 

Pradeep Kamath commented on PIG-1363:
-

+1

 Unnecessary loadFunc instantiations
 ---

 Key: PIG-1363
 URL: https://issues.apache.org/jira/browse/PIG-1363
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1363.patch


 In MRCompiler loadfuncs are instantiated at multiple locations in different 
 visit methods. This is inconsistent and confusing. LoadFunc should be 
 instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). 
 A getter should be added to POLoad to retrieve this instantiated loadFunc 
 wherever it is needed in later stages of compilation. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

2010-04-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Attachment: PIG-1372-2.patch

Regenerated patch against latest trunk (same changes).

Here are the results of running test-patch ant target:
[exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 



 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

2010-04-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Status: Patch Available  (was: Open)

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

2010-04-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Status: Open  (was: Patch Available)

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1353) Map-side joins

2010-04-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857149#action_12857149
 ] 

Ashutosh Chauhan commented on PIG-1353:
---

Hudson.. Oh Hudson.. when y'll get better ! Ran the full test suite. All of 
them passed. Ran test-patch:
{noformat}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 12 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

{noformat}

Patch is ready for review.

 Map-side joins
 --

 Key: PIG-1353
 URL: https://issues.apache.org/jira/browse/PIG-1353
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1353.patch, pig-1353.patch


 Pig already has couple of map-side join implementations: Merge Join and 
 Fragmented-Replicate Join. But both of them are pretty restrictive. Merge 
 Join can only join two tables and that too can only do inner join. FR Join 
 can join multiple relations, but it can also only do inner and left outer 
 joins. Further it restricts the sizes of side relations. It will be nice if 
 we can do map side joins on multiple tables as well do inner, left outer, 
 right outer and full outer joins. 
 Lot of groundwork for this has already been done in PIG-1309. Remaining will 
 be tracked in this jira.   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Changed SortInfo and SortColInfo to private as indicated in Pradeep's comments. 
 Marked constructor of ResourceSchema that uses SortInfo as private as well.

Patch checked in.

 Marking Pig interfaces for org.apache.pig package
 -

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1370.patch, PIG-1370_2.patch


 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
 of changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1363) Unnecessary loadFunc instantiations

2010-04-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1363:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch checked-in.

 Unnecessary loadFunc instantiations
 ---

 Key: PIG-1363
 URL: https://issues.apache.org/jira/browse/PIG-1363
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1363.patch


 In MRCompiler loadfuncs are instantiated at multiple locations in different 
 visit methods. This is inconsistent and confusing. LoadFunc should be 
 instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). 
 A getter should be added to POLoad to retrieve this instantiated loadFunc 
 wherever it is needed in later stages of compilation. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857154#action_12857154
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

As per http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg02257.html 
thread I am wondering if it will be safe and possible to make sure that job 
using this storage has speculative execution turned-off.  Otherwise, with S.E. 
turned on, there are too many scenarios we would have to handle. What do you 
think?

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-518) LOBinCond exception in LogicalPlanValidationExecutor when providing default values for bag

2010-04-14 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857157#action_12857157
 ] 

Viraj Bhat commented on PIG-518:


The above script generates the following error in Pig 0.7

2010-04-14 17:10:49,807 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1048: Two inputs of BinCond must have compatible schemas. left hand side: b: 
bag({colb2: bytearray,colb3: bytearray}) right hand side: 
bag({(chararray,chararray)})


A type cast to the right type solves the problem.

{code}
a = load 'sports_views.txt' as (col1:chararray, col2:chararray, 
col3:chararray); 
b = load 'queries.txt' as (colb1:chararray,colb2:chararray,colb3:chararray); 
mycogroup = cogroup a by col1 inner, b by colb1; 
mynewalias = foreach mycogroup generate flatten(a), flatten((COUNT(b)  0L ? 
b.(colb2,colb3) : {('','')}));
dump mynewalias; 
{code}

(alice,lakers,3,ipod,3)
(alice,warriors,7,ipod,3)
(peter,sun,7,sun,4)
(peter,nets,7,sun,4)

Closing bug as Pig yields the correct error message which the user can use to 
recode his script



 LOBinCond  exception in LogicalPlanValidationExecutor when providing default 
 values for bag
 ---

 Key: PIG-518
 URL: https://issues.apache.org/jira/browse/PIG-518
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
 Attachments: queries.txt, sports_views.txt


 The following piece of Pig script, which provides default values for bags 
 {('','')}  when the COUNT returns 0 fails with the following error. (Note: 
 Files used in this script are enclosed on this Jira.)
 
 a = load 'sports_views.txt' as (col1, col2, col3);
 b = load 'queries.txt' as (colb1,colb2,colb3);
 mycogroup = cogroup a by col1 inner, b by colb1;
 mynewalias = foreach mycogroup generate flatten(a), flatten((COUNT(b)  0L ? 
 b.(colb2,colb3) : {('','')}));
 dump mynewalias;
 
 java.io.IOException: Unable to open iterator for alias: mynewalias [Unable to 
 store for alias: mynewalias [Can't overwrite cause]]
  at java.lang.Throwable.initCause(Throwable.java:320)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1494)
  at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:85)
  at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:28)
  at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2345)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2252)
  at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
  at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
  at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
  at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
  at 
 org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:
 79)
  at org.apache.pig.PigServer.compileLp(PigServer.java:684)
  at org.apache.pig.PigServer.compileLp(PigServer.java:655)
  at org.apache.pig.PigServer.store(PigServer.java:433)
  at org.apache.pig.PigServer.store(PigServer.java:421)
  at org.apache.pig.PigServer.openIterator(PigServer.java:384)
  at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
  at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
  at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
  at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Unable to store for alias: mynewalias [Can't 
 overwrite cause]
  ... 26 more
 Caused by: java.lang.IllegalStateException: Can't overwrite cause
  ... 26 more
 

-- 
This message is automatically generated by 

[jira] Resolved: (PIG-518) LOBinCond exception in LogicalPlanValidationExecutor when providing default values for bag

2010-04-14 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat resolved PIG-518.


Fix Version/s: 0.7.0
   Resolution: Fixed

 LOBinCond  exception in LogicalPlanValidationExecutor when providing default 
 values for bag
 ---

 Key: PIG-518
 URL: https://issues.apache.org/jira/browse/PIG-518
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
 Fix For: 0.7.0

 Attachments: queries.txt, sports_views.txt


 The following piece of Pig script, which provides default values for bags 
 {('','')}  when the COUNT returns 0 fails with the following error. (Note: 
 Files used in this script are enclosed on this Jira.)
 
 a = load 'sports_views.txt' as (col1, col2, col3);
 b = load 'queries.txt' as (colb1,colb2,colb3);
 mycogroup = cogroup a by col1 inner, b by colb1;
 mynewalias = foreach mycogroup generate flatten(a), flatten((COUNT(b)  0L ? 
 b.(colb2,colb3) : {('','')}));
 dump mynewalias;
 
 java.io.IOException: Unable to open iterator for alias: mynewalias [Unable to 
 store for alias: mynewalias [Can't overwrite cause]]
  at java.lang.Throwable.initCause(Throwable.java:320)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1494)
  at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:85)
  at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:28)
  at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2345)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2252)
  at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
  at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
  at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
  at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
  at 
 org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:
 79)
  at org.apache.pig.PigServer.compileLp(PigServer.java:684)
  at org.apache.pig.PigServer.compileLp(PigServer.java:655)
  at org.apache.pig.PigServer.store(PigServer.java:433)
  at org.apache.pig.PigServer.store(PigServer.java:421)
  at org.apache.pig.PigServer.openIterator(PigServer.java:384)
  at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
  at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
  at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
  at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Unable to store for alias: mynewalias [Can't 
 overwrite cause]
  ... 26 more
 Caused by: java.lang.IllegalStateException: Can't overwrite cause
  ... 26 more
 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (PIG-829) DECLARE statement stop processing after special characters such as dot . , + % etc..

2010-04-14 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat resolved PIG-829.


Fix Version/s: 0.7.0
   Resolution: Fixed

Pig 0.7 yields the correct result.
{code}
x = LOAD 'something' as (a:chararray, b:chararray);
y = FILTER x BY ( a MATCHES '^.*yahoo.*$' );
STORE y INTO 'foo.bar';
{code}

 DECLARE statement stop processing after special characters such as dot . , 
 + % etc..
 --

 Key: PIG-829
 URL: https://issues.apache.org/jira/browse/PIG-829
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
 Fix For: 0.7.0


 The below Pig script does not work well, when special characters are used in 
 the DECLARE statement.
 {code}
 %DECLARE OUT foo.bar
 x = LOAD 'something' as (a:chararray, b:chararray);
 y = FILTER x BY ( a MATCHES '^.*yahoo.*$' );
 STORE y INTO '$OUT';
 {code}
 When the above script is run in the dry run mode; the substituted file does 
 not contain the special character.
 {code}
 java -cp pig.jar:/homes/viraj/hadoop-0.18.0-dev/conf -Dhod.server='' 
 org.apache.pig.Main -r declaresp.pig
 {code}
 Resulting file: declaresp.pig.substituted
 {code}
 x = LOAD 'something' as (a:chararray, b:chararray);
 y = FILTER x BY ( a MATCHES '^.*yahoo.*$' );
 STORE y INTO 'foo';
 {code}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira