[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.

2010-05-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1401:


Status: Patch Available  (was: Open)

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



need help again, what causes Cannot cast to Unknown ?

2010-05-04 Thread hc busy
Hey, guys, I managed to generate another horrendous error message (before
the plan completes). What typically causes this error to happen?

The script survives through all describes. (I can describe after all
assignments to aliases), but it still produces this error.

(running pit 0.5 on hadoop .20)

2010-05-03 22:54:22,054 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1051: Cannot cast to Unknown
2010-05-03 22:54:22,054 [main] ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected
exception caused the validation to stop
at
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
at
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
at org.apache.pig.PigServer.compileLp(PigServer.java:818)
at org.apache.pig.PigServer.compileLp(PigServer.java:789)
at org.apache.pig.PigServer.execute(PigServer.java:758)
at org.apache.pig.PigServer.access$100(PigServer.java:89)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:947)
at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:320)
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException:
ERROR 1060: Cannot resolve Join output schema
at
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2360)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:201)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
... 14 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException:
ERROR 1051: Cannot cast to Unknown
at
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForJoinInnerPlan(TypeCheckingVisitor.java:2544)
at
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2348)
... 19 more


[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-05-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863750#action_12863750
 ] 

Jeff Hammerbacher commented on PIG-1331:


Hey,

Does this issue make PIG-823 a duplicate?

Thanks,
Jeff

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Assignee: Ajay Kidave
 Fix For: 0.8.0

 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, 
 owl.contrib.3.tgz, owl.contrib.4.tar.gz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.

2010-05-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863760#action_12863760
 ] 

Hadoop QA commented on PIG-1401:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443548/PIG-1401-3.patch
  against trunk revision 940601.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 536 release audit warnings 
(more than the trunk's current 535 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/console

This message is automatically generated.

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-05-04 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863835#action_12863835
 ] 

Jay Tang commented on PIG-1331:
---

Yes, Jeff.  Owl, as a table management service, has a metadata module. Please 
see http://wiki.apache.org/pig/owl for more information.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Assignee: Ajay Kidave
 Fix For: 0.8.0

 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, 
 owl.contrib.3.tgz, owl.contrib.4.tar.gz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.

2010-05-04 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863859#action_12863859
 ] 

Pradeep Kamath commented on PIG-1401:
-

The release audit warning is due to the new test script file added in the patch 
and can be ignored - the patch is ready for review.

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Prepare to release Pig 0.7.0

2010-05-04 Thread Daniel Dai

Pig Developers,

It has been a few weeks since we branch for 0.7. We have fixed couple of 
bugs since then and now we believe 0.7 branch is stabilized and ready to 
release.


I will work on rolling up the release. Please let me know if you have 
any objections.


Thanks
Daniel


[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.

2010-05-04 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863921#action_12863921
 ] 

Olga Natkovich commented on PIG-1401:
-

+1

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted

2010-05-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1391:
---

   Status: Patch Available  (was: Open)
Affects Version/s: 0.8.0
Fix Version/s: 0.8.0
   0.6.0

 pig unit tests leave behind files in temp directory because MiniCluster files 
 don't get deleted
 ---

 Key: PIG-1391
 URL: https://issues.apache.org/jira/browse/PIG-1391
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.7.0, 0.8.0, 0.6.0

 Attachments: minicluster.patch, PIG-1391.06.2.patch, 
 PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch


 Pig unit test runs leave behind files in temp dir (/tmp) and there are too 
 many files in the directory over time.
 Most of the files are left behind by MiniCluster . It closes/shutsdown 
 MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the 
 constructor is called, only in finalize(). And java does not guarantee that 
 finalize() will be called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted

2010-05-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1391:
---

Attachment: PIG-1391.trunk.patch

 pig unit tests leave behind files in temp directory because MiniCluster files 
 don't get deleted
 ---

 Key: PIG-1391
 URL: https://issues.apache.org/jira/browse/PIG-1391
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: minicluster.patch, PIG-1391.06.2.patch, 
 PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch


 Pig unit test runs leave behind files in temp dir (/tmp) and there are too 
 many files in the directory over time.
 Most of the files are left behind by MiniCluster . It closes/shutsdown 
 MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the 
 constructor is called, only in finalize(). And java does not guarantee that 
 finalize() will be called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1398) Marking Pig interfaces for org.apache.pig.data package

2010-05-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1398:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch checked in.

 Marking Pig interfaces for org.apache.pig.data package
 --

 Key: PIG-1398
 URL: https://issues.apache.org/jira/browse/PIG-1398
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1398.patch


 Marking Pig interfaces for stability and audience, as well as javadoc 
 cleanup, for the data package.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1378) har url not usable in Pig scripts

2010-05-04 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863997#action_12863997
 ] 

Pradeep Kamath commented on PIG-1378:
-

Spoke with a developer on the hadoop team to confirm that this is an issue with 
Hadoop code fixed on the hadoop trunk ( 
https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a 
hadoop release which is used by pig, this will remain an issue - not sure if we 
should keep this jira open until that point - am fine if we should.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 

[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-05-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1211:


Status: Patch Available  (was: Open)

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-05-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1211:


Attachment: PIG-1211.patch

Attached patch addresses the issue by adding support for a check script option. 
For this purpose, the -c command line option is reused thus fixing 
https://issues.apache.org/jira/browse/PIG-1382 (Command line option -c doesn't 
work ...Currently this option is not used...).

The implementation of this check option piggybacks on explain -script and 
just modifies the GruntParser code to not output the explain output. 

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1382) Command line option -c doesn't work

2010-05-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-1382.
-

Hadoop Flags: [Incompatible change]
Release Note: -c (-cluster) was earlier documented as the option to provide 
cluster information - this was not being used in the Pig code though - with 
PIG-1211, -c is being reused as the option to check syntax of the pig script 
Assignee: Pradeep Kamath
  Resolution: Fixed

Fixed through 
https://issues.apache.org/jira/browse/PIG-1211?focusedCommentId=12864002page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12864002

 Command line option -c doesn't work
 ---

 Key: PIG-1382
 URL: https://issues.apache.org/jira/browse/PIG-1382
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Pradeep Kamath
 Fix For: 0.8.0


 Currently this option is not used, but it's documented:
 -c, -cluster clustername, kryptonite is default
 We should either remove it from documentation or find someway to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-824) SQL interface for Pig

2010-05-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-824:
--

Attachment: pigsql.patch
pig_sql_beta.pdf
java-cup-11a-runtime.jar

 SQL patch (pigsql.patch) based on version of owl in svn  and documentation 
(pig_sql_beta.pdf).  Patch is against the trunk revision 941018 .


 SQL interface for Pig
 -

 Key: PIG-824
 URL: https://issues.apache.org/jira/browse/PIG-824
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
 Attachments: java-cup-11a-runtime.jar, PIG-824.1.patch, 
 PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, SQL_IN_PIG.html


 In the last 18 month PigLatin has gained significant popularity within the 
 open source community. Many users like its data flow model, its rich type 
 system and its ability to work with any data available on HDFS or outside. We 
 have also heard from many users that having Pig speak SQL would bring many 
 more users. Having a single system that exports multiple interfaces is a big 
 advantage as it guarantees consistent semantics, custom code reuse, and 
 reduces the amount of maintenance. This is especially relevant for project 
 where using both interfaces for different parts of the system is relevant.  
 For instance, in a 
 data warehousing system, you would have ETL component that brings data  into 
 the warehouse and a component that analyzes the data and produces reports. 
 PigLatin is uniquely suited for ETL processing while SQL might be a better 
 fit for report generation.
 To start, it would make sense to implement a subset of SQL92 standard and to 
 be as much as possible standard compliant. This would include all the 
 standard constructs: select, from, where, group-by + having, order by, limit, 
 join (inner + outer). Several extensions  such as support for pig's UDFs and 
 possibly streaming, multiquery and support for pig's complex types would be 
 helpful.
 This work is dependent on metadata support outlined in 
 https://issues.apache.org/jira/browse/PIG-823

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-824) SQL interface for Pig

2010-05-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-824:
-

Assignee: Thejas M Nair

 SQL interface for Pig
 -

 Key: PIG-824
 URL: https://issues.apache.org/jira/browse/PIG-824
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Attachments: java-cup-11a-runtime.jar, PIG-824.1.patch, 
 PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, SQL_IN_PIG.html


 In the last 18 month PigLatin has gained significant popularity within the 
 open source community. Many users like its data flow model, its rich type 
 system and its ability to work with any data available on HDFS or outside. We 
 have also heard from many users that having Pig speak SQL would bring many 
 more users. Having a single system that exports multiple interfaces is a big 
 advantage as it guarantees consistent semantics, custom code reuse, and 
 reduces the amount of maintenance. This is especially relevant for project 
 where using both interfaces for different parts of the system is relevant.  
 For instance, in a 
 data warehousing system, you would have ETL component that brings data  into 
 the warehouse and a component that analyzes the data and produces reports. 
 PigLatin is uniquely suited for ETL processing while SQL might be a better 
 fit for report generation.
 To start, it would make sense to implement a subset of SQL92 standard and to 
 be as much as possible standard compliant. This would include all the 
 standard constructs: select, from, where, group-by + having, order by, limit, 
 join (inner + outer). Several extensions  such as support for pig's UDFs and 
 possibly streaming, multiquery and support for pig's complex types would be 
 helpful.
 This work is dependent on metadata support outlined in 
 https://issues.apache.org/jira/browse/PIG-823

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-824) SQL interface for Pig

2010-05-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-824:
--

Attachment: java-cup-11a.jar
students2.bin
students_attr.bin

copy the attached jar files to lib/ dir to build the patch.

copy the bin storage format test files to following dirs -
students2.bin - test/org/apache/pig/test/data/SQL/students2.bin and 
contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students2.bin
students_attr.bin - test/org/apache/pig/test/data/SQL/students_attr.bin and 
contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students_attr.bin



 SQL interface for Pig
 -

 Key: PIG-824
 URL: https://issues.apache.org/jira/browse/PIG-824
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Attachments: java-cup-11a-runtime.jar, java-cup-11a.jar, 
 PIG-824.1.patch, PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, 
 SQL_IN_PIG.html, students2.bin, students_attr.bin


 In the last 18 month PigLatin has gained significant popularity within the 
 open source community. Many users like its data flow model, its rich type 
 system and its ability to work with any data available on HDFS or outside. We 
 have also heard from many users that having Pig speak SQL would bring many 
 more users. Having a single system that exports multiple interfaces is a big 
 advantage as it guarantees consistent semantics, custom code reuse, and 
 reduces the amount of maintenance. This is especially relevant for project 
 where using both interfaces for different parts of the system is relevant.  
 For instance, in a 
 data warehousing system, you would have ETL component that brings data  into 
 the warehouse and a component that analyzes the data and produces reports. 
 PigLatin is uniquely suited for ETL processing while SQL might be a better 
 fit for report generation.
 To start, it would make sense to implement a subset of SQL92 standard and to 
 be as much as possible standard compliant. This would include all the 
 standard constructs: select, from, where, group-by + having, order by, limit, 
 join (inner + outer). Several extensions  such as support for pig's UDFs and 
 possibly streaming, multiquery and support for pig's complex types would be 
 helpful.
 This work is dependent on metadata support outlined in 
 https://issues.apache.org/jira/browse/PIG-823

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-824) SQL interface for Pig

2010-05-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-824:
--

Attachment: pigsql_tutorial.txt

Attaching SQL tutorial (pigsql_tutorial.txt) - 
This Pig SQL tutorial shows you how to run SQL scripts in local mode and 
mapreduce mode.
The metadata is stored using Owl. In this tutorial a jetty/derby based owl 
setup is used so that only minimal setup needs to be done to get started.


 SQL interface for Pig
 -

 Key: PIG-824
 URL: https://issues.apache.org/jira/browse/PIG-824
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Attachments: java-cup-11a-runtime.jar, java-cup-11a.jar, 
 PIG-824.1.patch, PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, 
 pigsql_tutorial.txt, SQL_IN_PIG.html, students2.bin, students_attr.bin


 In the last 18 month PigLatin has gained significant popularity within the 
 open source community. Many users like its data flow model, its rich type 
 system and its ability to work with any data available on HDFS or outside. We 
 have also heard from many users that having Pig speak SQL would bring many 
 more users. Having a single system that exports multiple interfaces is a big 
 advantage as it guarantees consistent semantics, custom code reuse, and 
 reduces the amount of maintenance. This is especially relevant for project 
 where using both interfaces for different parts of the system is relevant.  
 For instance, in a 
 data warehousing system, you would have ETL component that brings data  into 
 the warehouse and a component that analyzes the data and produces reports. 
 PigLatin is uniquely suited for ETL processing while SQL might be a better 
 fit for report generation.
 To start, it would make sense to implement a subset of SQL92 standard and to 
 be as much as possible standard compliant. This would include all the 
 standard constructs: select, from, where, group-by + having, order by, limit, 
 join (inner + outer). Several extensions  such as support for pig's UDFs and 
 possibly streaming, multiquery and support for pig's complex types would be 
 helpful.
 This work is dependent on metadata support outlined in 
 https://issues.apache.org/jira/browse/PIG-823

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted

2010-05-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864070#action_12864070
 ] 

Hadoop QA commented on PIG-1391:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443607/PIG-1391.trunk.patch
  against trunk revision 940601.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 251 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/314/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/314/console

This message is automatically generated.

 pig unit tests leave behind files in temp directory because MiniCluster files 
 don't get deleted
 ---

 Key: PIG-1391
 URL: https://issues.apache.org/jira/browse/PIG-1391
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: minicluster.patch, PIG-1391.06.2.patch, 
 PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch


 Pig unit test runs leave behind files in temp dir (/tmp) and there are too 
 many files in the directory over time.
 Most of the files are left behind by MiniCluster . It closes/shutsdown 
 MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the 
 constructor is called, only in finalize(). And java does not guarantee that 
 finalize() will be called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1288) EvalFunc returnType is wrong for generic subclasses

2010-05-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1288:


Status: Open  (was: Patch Available)

 EvalFunc returnType is wrong for generic subclasses
 ---

 Key: PIG-1288
 URL: https://issues.apache.org/jira/browse/PIG-1288
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch


 From Garrett Buster Kaminaga:
 The EvalFunc constructor has code to determine the return type of the 
 function.
 This walks up the object hierarchy until it encounters EvalFunc, then calls 
 getActualTypeArguments and extracts type
 param 0.
 However, if the user class is itself a generic extension of EvalFunc, then 
 the returned object is not the correct type,
 but a TypeVariable.
 Example:
   class MyAbstractEvalFuncT extends EvalFuncT ...
   class MyEvalFunc extends MyAbstractEvalFuncString ...
 when MyEvalFunc() is called, inside EvalFunc constructor the return type is 
 set to a TypeVariable rather than
 String.class.
 The workaround we've implemented is for the MyAbstractEvalFuncT to 
 determine *its* type parameters using code
 similar to that in the EvalFunc constructor, and then reset protected data 
 member returnType manually in the
 MyAbstractEvalFunc constructor.  (though this has the same drawback of not 
 working if someone then extends
 MyAbstractEvalFunc)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1288) EvalFunc returnType is wrong for generic subclasses

2010-05-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1288:


Attachment: PIG-1288-3.patch

Fix unit test failures

 EvalFunc returnType is wrong for generic subclasses
 ---

 Key: PIG-1288
 URL: https://issues.apache.org/jira/browse/PIG-1288
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch


 From Garrett Buster Kaminaga:
 The EvalFunc constructor has code to determine the return type of the 
 function.
 This walks up the object hierarchy until it encounters EvalFunc, then calls 
 getActualTypeArguments and extracts type
 param 0.
 However, if the user class is itself a generic extension of EvalFunc, then 
 the returned object is not the correct type,
 but a TypeVariable.
 Example:
   class MyAbstractEvalFuncT extends EvalFuncT ...
   class MyEvalFunc extends MyAbstractEvalFuncString ...
 when MyEvalFunc() is called, inside EvalFunc constructor the return type is 
 set to a TypeVariable rather than
 String.class.
 The workaround we've implemented is for the MyAbstractEvalFuncT to 
 determine *its* type parameters using code
 similar to that in the EvalFunc constructor, and then reset protected data 
 member returnType manually in the
 MyAbstractEvalFunc constructor.  (though this has the same drawback of not 
 working if someone then extends
 MyAbstractEvalFunc)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1288) EvalFunc returnType is wrong for generic subclasses

2010-05-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1288:


Status: Patch Available  (was: Open)

 EvalFunc returnType is wrong for generic subclasses
 ---

 Key: PIG-1288
 URL: https://issues.apache.org/jira/browse/PIG-1288
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch


 From Garrett Buster Kaminaga:
 The EvalFunc constructor has code to determine the return type of the 
 function.
 This walks up the object hierarchy until it encounters EvalFunc, then calls 
 getActualTypeArguments and extracts type
 param 0.
 However, if the user class is itself a generic extension of EvalFunc, then 
 the returned object is not the correct type,
 but a TypeVariable.
 Example:
   class MyAbstractEvalFuncT extends EvalFuncT ...
   class MyEvalFunc extends MyAbstractEvalFuncString ...
 when MyEvalFunc() is called, inside EvalFunc constructor the return type is 
 set to a TypeVariable rather than
 String.class.
 The workaround we've implemented is for the MyAbstractEvalFuncT to 
 determine *its* type parameters using code
 similar to that in the EvalFunc constructor, and then reset protected data 
 member returnType manually in the
 MyAbstractEvalFunc constructor.  (though this has the same drawback of not 
 working if someone then extends
 MyAbstractEvalFunc)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-05-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864088#action_12864088
 ] 

Jeff Hammerbacher commented on PIG-1331:


Okay, seems like PIG-823 should be closed then. I don't have the ability to do 
that.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Assignee: Ajay Kidave
 Fix For: 0.8.0

 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, 
 owl.contrib.3.tgz, owl.contrib.4.tar.gz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-05-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864120#action_12864120
 ] 

Hadoop QA commented on PIG-1211:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443635/PIG-1211.patch
  against trunk revision 941005.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 530 release audit warnings 
(more than the trunk's current 529 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/console

This message is automatically generated.

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.