RE: A small question about Pig
Hadoop 19 does not work with Pig 0.6.0. You need a hadoop 20 cluster. Olga -Original Message- From: azuryy_yu [mailto:azuryy...@126.com] Sent: Saturday, April 17, 2010 8:59 AM To: pig-u...@hadoop.apache.org; pig-dev@hadoop.apache.org Subject: A small question about Pig I am a newer to Pig I installed pig 0.6.0, but My Cluster Hadoop version is 0.19.2, does that work for me? Thx
[jira] Created: (PIG-1381) Need a way for Pig to take an alternative property file
Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Fix For: 0.8.0 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig
[ https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1375: --- Attachment: PIG-1375.patch Thank Xuefu for the feedback. Updated the patch to incorporate in comment 2 and 4. For comment 1) The indentation change is only incidental to make some files (impacted by this feature) to follow Zebra's tab policy - space of width two. For comment 3) The flag idea needs to be justified by further performance profiling work. The check here should be trivial compared with other operations such as generateKey() and insert(). [Zebra] To support writing multiple Zebra tables through Pig Key: PIG-1375 URL: https://issues.apache.org/jira/browse/PIG-1375 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.8.0 Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch In Zebra, we already have multiple outputs support for map/reduce. But we do not support this feature if users use Zebra through Pig. This jira is to address this issue. We plan to support writing to multiple output tables through Pig as well. We propose to support the following Pig store statements with multiple outputs: store relation into 'loc1,loc2,loc3' using org.apache.hadoop.zebra.pig.TableStorer('storagehint_string', 'complete name of your custom partition class', 'some arguments to partition class'); /* if certain partition class arguments is needed */ store relation into 'loc1,loc2,loc3' using org.apache.hadoop.zebra.pig.TableStorer('storagehint_string', 'complete name of your custom partition class'); /* if no partition class arguments is needed */ Note that users need to specify up to three arguments - storage hint string, complete name of partition class and partition class arguments string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858709#action_12858709 ] Ashutosh Chauhan commented on PIG-1378: --- {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} This is incorrect. You need to do the following: {noformat} grunt a = load 'har://hdfs-namenode.foo.com:8020/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} Note that scheme is hdfs. Then a -(dash), followed by namenode url, followed by semi-colon, followed by port number(8020) and then location of your har archive. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.8.0 I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at