Hi, >> Invalid field projection. Projected field [tracetype] does not exist.
The error indicates that the "tracetype" doesn't exist in the Pig schema of the relation "avro". What AvroStorage does is to automatically convert Avro schema to Pig schema during the load. Although you have "tracetype" in your Avro schema, "tracetype" doesn't exist in the generated Pig schema for whatever reason. Can you please try to "describe avro"? You can replace group and dump commands with describe in your Pig script. This will show you what the Pig schema of "avro" is. If "tracetype" indeed doesn't exist, you have to find out why it doesn't. It could be because the schema of .avro files is not the same or because there is a bug in AvroStorage, etc. >> Maybe globbing with [] doesnt work, but wildcard works? You're right. AvroStorage internally uses Hadoop path globing, and Hadoop path globing doesn't support '[ ]'. But the above error (Projected field [tracetype] does not exist) is not because of this. URISyntaxException is what you will get because of '[ ]'. Thanks, Cheolsoo On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst <[email protected]> wrote: > Just tried this: > > > ------------------------------**---------------------- > REGISTER 'hdfs:///lib/avro-1.7.2.jar'; > REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar'; > REGISTER 'hdfs:///lib/piggybank.jar'; > > DEFINE AvroStorage org.apache.pig.piggybank.**storage.avro.AvroStorage(); > > avro = load '/data/2012/trace_ejb3/2012-**01-0*.avro' USING AvroStorage(); > > groups = group avro by tracetype; > > dump groups; > ------------------------------**---------------------- > > gave me: > > <file avro-test.pig, line 10, column 23> Invalid field projection. > Projected field [tracetype] does not exist. > > Pig Stack Trace > --------------- > ERROR 1025: > <file avro-test.pig, line 10, column 23> Invalid field projection. > Projected field [tracetype] does not exist. > > org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1066: > Unable to open iterator for alias groups > at org.apache.pig.PigServer.**openIterator(PigServer.java:**862) > at org.apache.pig.tools.grunt.**GruntParser.processDump(** > GruntParser.java:682) > at org.apache.pig.tools.**pigscript.parser.** > PigScriptParser.parse(**PigScriptParser.java:303) > at org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** > GruntParser.java:189) > at org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** > GruntParser.java:165) > at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.**java:555) > at org.apache.pig.Main.main(Main.**java:111) > at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method) > at sun.reflect.**NativeMethodAccessorImpl.**invoke(** > NativeMethodAccessorImpl.java:**39) > at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** > DelegatingMethodAccessorImpl.**java:25) > at java.lang.reflect.Method.**invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.**main(RunJar.java:208) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias > groups > at org.apache.pig.PigServer.**storeEx(PigServer.java:961) > at org.apache.pig.PigServer.**store(PigServer.java:924) > at org.apache.pig.PigServer.**openIterator(PigServer.java:**837) > ... 12 more > Caused by: org.apache.pig.impl.plan.**PlanValidationException: ERROR 1025: > <file avro-test.pig, line 10, column 23> Invalid field projection. > Projected field [tracetype] does not exist. > at org.apache.pig.newplan.**logical.expression.** > ProjectExpression.findColNum(**ProjectExpression.java:183) > at org.apache.pig.newplan.**logical.expression.** > ProjectExpression.**setColumnNumberFromAlias(**ProjectExpression.java:166) > at org.apache.pig.newplan.**logical.visitor.** > ColumnAliasConversionVisitor$**1.visit(**ColumnAliasConversionVisitor.** > java:53) > at org.apache.pig.newplan.**logical.expression.** > ProjectExpression.accept(**ProjectExpression.java:207) > at org.apache.pig.newplan.**DependencyOrderWalker.walk(** > DependencyOrderWalker.java:75) > at org.apache.pig.newplan.**PlanVisitor.visit(PlanVisitor.** > java:50) > at org.apache.pig.newplan.**logical.optimizer.** > AllExpressionVisitor.visit(**AllExpressionVisitor.java:101) > at org.apache.pig.newplan.**logical.relational.LOCogroup.** > accept(LOCogroup.java:235) > at org.apache.pig.newplan.**DependencyOrderWalker.walk(** > DependencyOrderWalker.java:75) > at org.apache.pig.newplan.**PlanVisitor.visit(PlanVisitor.** > java:50) > at org.apache.pig.PigServer$**Graph.compile(PigServer.java:**1621) > at org.apache.pig.PigServer$**Graph.compile(PigServer.java:**1616) > at org.apache.pig.PigServer$**Graph.access$200(PigServer.** > java:1339) > at org.apache.pig.PigServer.**storeEx(PigServer.java:956) > ... 14 more > ==============================**==============================** > ==================== > > > Maybe globbing with [] doesnt work, but wildcard works? No idea why i get > the error above though.. > > > Kind regards, > > Bart > > Cheolsoo Park schreef op 25.11.2012 15:33: > >> Hi Bart, >> >> avro = load '/data/2012/trace_ejb3/2012-****01-*.avro' USING >> AvroStorage(); >> gives me: >> Schema for avro unknown. >> >> This should work. The error that you're getting is not from AvroStorage >> but >> PigServer. >> >> grep -r "Schema for .* unknown" * >> src/org/apache/pig/PigServer.**java: >> System.out.println("Schema for " + alias + " unknown."); >> ... >> >> It looks like that you have an error in your Pig script. Can you please >> provide your Pig script and the schema of your avro files that reproduce >> the error? >> >> Thanks, >> Cheolsoo >> >> >> On Sun, Nov 25, 2012 at 1:02 AM, Bart Verwilst <[email protected]> wrote: >> >> Hi, >>> >>> I've tried loading a csv with PigStorage(), getting this: >>> >>> >>> txt = load '/import.mysql/trace_ejb3_****2011/part-m-00000' USING >>> >>> PigStorage(','); >>> describe txt; >>> >>> Schema for txt unknown. >>> >>> Maybe this is because of it being a csv, so a schema is hard to figure >>> out.. >>> >>> Any other suggestions? Our whole hadoop setup is built around being able >>> to selectively load avro files to run our jobs on, if this doesn't work >>> then we're pretty much screwed.. :) >>> >>> Thanks in advance! >>> >>> Bart >>> >>> Russell Jurney schreef op 24.11.2012 20:23: >>> >>> I suspect the problem is AvroStorage, not globbing. Try this with >>> >>>> pigstorage. >>>> >>>> Russell Jurney twitter.com/rjurney >>>> >>>> >>>> On Nov 24, 2012, at 5:15 AM, Bart Verwilst <[email protected]> wrote: >>>> >>>> Hello, >>>> >>>>> >>>>> Thanks for your suggestion! >>>>> I switch my avro variable to avro = load '$INPUT' USING AvroStorage(); >>>>> >>>>> However I get the same results this way: >>>>> >>>>> $ pig -p INPUT=/data/2012/trace_ejb3/****2012-01-02.avro avro-test.pig >>>>> which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**** >>>>> java/jdk1.6.0_33/bin/:/usr/****local/bin:/bin:/usr/bin:/usr/**** >>>>> local/sbin:/usr/sbin:/sbin:/****usr/local/bin) >>>>> <snip> >>>>> avro: {id: long,timestamp: long,latitude: int,longitude: int,speed: >>>>> int,heading: int,terminalid: int,customerid: chararray,mileage: >>>>> int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: >>>>> (id: >>>>> long,value: chararray,pkey: chararray)}} >>>>> >>>>> >>>>> $ pig -p INPUT="/data/2012/trace_ejb3/****2012-01-0[12].avro" >>>>> avro-test.pig >>>>> >>>>> which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**** >>>>> java/jdk1.6.0_33/bin/:/usr/****local/bin:/bin:/usr/bin:/usr/**** >>>>> local/sbin:/usr/sbin:/sbin:/****usr/local/bin) >>>>> <snip> >>>>> 2012-11-24 14:11:17,309 [main] ERROR org.apache.pig.tools.grunt.**** >>>>> Grunt >>>>> >>>>> - ERROR 2999: Unexpected internal error. null >>>>> Caused by: java.net.URISyntaxException: Illegal character in path at >>>>> index 31: /data/2012/trace_ejb3/2012-01-****0[12].avro >>>>> >>>>> >>>>> $ pig -p INPUT='/data/2012/trace_ejb3/****2012-01-0[12].avro' >>>>> avro-test.pig >>>>> >>>>> which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**** >>>>> java/jdk1.6.0_33/bin/:/usr/****local/bin:/bin:/usr/bin:/usr/**** >>>>> local/sbin:/usr/sbin:/sbin:/****usr/local/bin) >>>>> <snip> >>>>> 2012-11-24 14:12:05,085 [main] ERROR org.apache.pig.tools.grunt.**** >>>>> Grunt >>>>> >>>>> - ERROR 2999: Unexpected internal error. null >>>>> Details at logfile: /var/lib/hadoop-hdfs/pig_****1353762722742.log >>>>> >>>>> Caused by: java.net.URISyntaxException: Illegal character in path at >>>>> index 31: /data/2012/trace_ejb3/2012-01-****0[12].avro >>>>> >>>>> >>>>> Deepak Tiwari schreef op 24.11.2012 00:41: >>>>> >>>>> Hi, >>>>>> >>>>>> I dont have a system to test it right now, but I have been passing it >>>>>> using >>>>>> under parameter -p and it works. >>>>>> >>>>>> change line to accept parameters like avro = load '$INPUT' >>>>>> USING >>>>>> AvroStorage(); >>>>>> >>>>>> bin/pig -p INPUT="/data/2012/trace_ejb3/****2012-**01-0[12].avro" >>>>>> >>>>>> <scriptName> >>>>>> >>>>>> I think if you dont give double quotes then the expansion is done by >>>>>> OS. >>>>>> >>>>>> Please let us know if it doesnt work... >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>>> >>>>>>> I have the following files on HDFS: >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 22989179 2012-11-22 11:17 >>>>>>> /data/2012/trace_ejb3/2012-01-******01.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 240551819 2012-11-22 14:27 >>>>>>> /data/2012/trace_ejb3/2012-01-******02.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 324464635 2012-11-22 18:28 >>>>>>> /data/2012/trace_ejb3/2012-01-******03.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 345526418 2012-11-22 21:30 >>>>>>> /data/2012/trace_ejb3/2012-01-******04.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 351322916 2012-11-23 00:28 >>>>>>> /data/2012/trace_ejb3/2012-01-******05.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 325953043 2012-11-23 04:32 >>>>>>> /data/2012/trace_ejb3/2012-01-******06.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 107019156 2012-11-23 05:58 >>>>>>> /data/2012/trace_ejb3/2012-01-******07.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 46392850 2012-11-23 06:37 >>>>>>> /data/2012/trace_ejb3/2012-01-******08.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 361970930 2012-11-23 10:06 >>>>>>> /data/2012/trace_ejb3/2012-01-******09.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 398462505 2012-11-23 13:44 >>>>>>> /data/2012/trace_ejb3/2012-01-******10.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 400785976 2012-11-23 17:17 >>>>>>> /data/2012/trace_ejb3/2012-01-******11.avro >>>>>>> >>>>>>> -rw-r--r-- 3 hdfs supergroup 400027565 2012-11-23 20:43 >>>>>>> /data/2012/trace_ejb3/2012-01-******12.avro >>>>>>> >>>>>>> >>>>>>> Using Pig 0.10.0-cdh4.1.2, i try to load those files, and describe >>>>>>> them. >>>>>>> >>>>>>> REGISTER 'hdfs:///lib/avro-1.7.2.jar'; >>>>>>> REGISTER 'hdfs:///lib/json-simple-1.1.******1.jar'; >>>>>>> REGISTER 'hdfs:///lib/piggybank.jar'; >>>>>>> >>>>>>> DEFINE AvroStorage org.apache.pig.piggybank.**** >>>>>>> storage.avro.AvroStorage(); >>>>>>> >>>>>>> avro = load '/data/2012/trace_ejb3/2012-******01-01.avro' USING >>>>>>> >>>>>>> AvroStorage(); >>>>>>> >>>>>>> describe avro; >>>>>>> >>>>>>> >>>>>>> This works, same with 2012-01-02.avro. >>>>>>> >>>>>>> However, as soon as i want to include multiple files, no dice. >>>>>>> >>>>>>> avro = load '/data/2012/trace_ejb3/2012-******01-{01,02}.avro' USING >>>>>>> AvroStorage(); >>>>>>> gives me: >>>>>>> 2012-11-23 21:41:07,475 [main] ERROR org.apache.pig.tools.grunt.**** >>>>>>> **Grunt >>>>>>> >>>>>>> - >>>>>>> ERROR 2999: Unexpected internal error. null >>>>>>> Caused by: java.net.URISyntaxException: Illegal character in path at >>>>>>> index >>>>>>> 30: /data/2012/trace_ejb3/2012-01-******{01,02}.avro >>>>>>> >>>>>>> avro = load '/data/2012/trace_ejb3/2012-******01-*.avro' USING >>>>>>> >>>>>>> AvroStorage(); >>>>>>> gives me: >>>>>>> Schema for avro unknown. >>>>>>> >>>>>>> avro = load '/data/2012/trace_ejb3/2012-******01-0[12].avro' USING >>>>>>> >>>>>>> AvroStorage(); >>>>>>> also gives me: >>>>>>> Caused by: java.net.URISyntaxException: Illegal character in path at >>>>>>> index >>>>>>> 31: /data/2012/trace_ejb3/2012-01-******0[12].avro >>>>>>> >>>>>>> >>>>>>> What am i doing wrong here? According to >>>>>>> http://hadoop.apache.org/docs/******<http://hadoop.apache.org/docs/****>< >>>>>>> http://hadoop.apache.org/**docs/**<http://hadoop.apache.org/docs/**> >>>>>>> > >>>>>>> >>>>>>> r0.21.0/api/org/apache/hadoop/******fs/FileSystem.html#**** >>>>>>> globStatus%** >>>>>>> >>>>>>> 28org.apache.hadoop.fs.Path%****29<http://hadoop.apache.org/** >>>>>>> docs/r0.21.0/api/org/apache/****hadoop/fs/FileSystem.html#** >>>>>>> >>>>>>> globStatus%28org.apache.****hadoop.fs.Path%29<http://** >>>>>>> hadoop.apache.org/docs/r0.21.**0/api/org/apache/hadoop/fs/** >>>>>>> FileSystem.html#globStatus%**28org.apache.hadoop.fs.Path%29<http://hadoop.apache.org/docs/r0.21.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus%28org.apache.hadoop.fs.Path%29> >>>>>>> **>>, >>>>>>> >>>>>>> this should all be acceptable input? >>>>>>> >>>>>>> Thanks in advance! >>>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> Bart >>>>>>> >>>>>>> >>>>>>>
