RE: A small question about Pig

2010-04-19 Thread Olga Natkovich
Hadoop 19 does not work with Pig 0.6.0. You need a hadoop 20 cluster.

Olga

-Original Message-
From: azuryy_yu [mailto:azuryy...@126.com] 
Sent: Saturday, April 17, 2010 8:59 AM
To: pig-u...@hadoop.apache.org; pig-dev@hadoop.apache.org
Subject: A small question about Pig

I am a newer to Pig
 
I installed pig 0.6.0, but My Cluster Hadoop version is 0.19.2, does
that work for me? Thx


[jira] Created: (PIG-1381) Need a way for Pig to take an alternative property file

2010-04-19 Thread Daniel Dai (JIRA)
Need a way for Pig to take an alternative property file
---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
 Fix For: 0.8.0


Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
default pig.properties and if user have a different pig.properties, there will 
be a conflict since we can only read one. There are couple of ways to solve it:

1. Give a command line option for user to pass an additional property file
2. Change the name for default pig.properties to pig-default.properties, and 
user can give a pig.properties to override
3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to 
be more natural for hadoop community. If so, we shall provide backward 
compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-19 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1375:
---

Attachment: PIG-1375.patch

Thank Xuefu for the feedback.

Updated the patch to incorporate in comment 2 and 4.
For comment 1) The indentation change is only incidental to make some files 
(impacted by this feature) to follow Zebra's tab policy - space of width two.
For comment 3) The flag idea needs to be justified by further performance 
profiling work. The check here should be trivial compared with other operations 
such as generateKey() and insert().


 

 [Zebra] To support writing multiple Zebra tables through Pig
 

 Key: PIG-1375
 URL: https://issues.apache.org/jira/browse/PIG-1375
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch


 In Zebra, we already have multiple outputs support for map/reduce.  But we do 
 not support this feature if users use Zebra through Pig.
 This jira is to address this issue. We plan to support writing to multiple 
 output tables through Pig as well.
 We propose to support the following Pig store statements with multiple 
 outputs:
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class', 'some arguments to partition 
 class'); /* if certain partition class arguments is needed */
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class'); /* if no partition class 
 arguments is needed */
 Note that users need to specify up to three arguments - storage hint string, 
 complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1378) har url not usable in Pig scripts

2010-04-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858709#action_12858709
 ] 

Ashutosh Chauhan commented on PIG-1378:
---

{noformat}
grunt a = load 
'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
grunt dump a;
{noformat}

 This is incorrect. You need to do the following:
{noformat}
grunt a = load 
'har://hdfs-namenode.foo.com:8020/user/viraj/project/subproject/files/size/data';
 
grunt dump a;
{noformat}

Note that scheme is hdfs. Then a -(dash), followed by namenode url, followed by 
semi-colon, followed by port number(8020) and then location of your har 
archive. 


 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.8.0


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at