I suspect the problem is AvroStorage, not globbing. Try this with pigstorage.
Russell Jurney twitter.com/rjurney On Nov 24, 2012, at 5:15 AM, Bart Verwilst <[email protected]> wrote: > Hello, > > Thanks for your suggestion! > I switch my avro variable to avro = load '$INPUT' USING AvroStorage(); > > However I get the same results this way: > > $ pig -p INPUT=/data/2012/trace_ejb3/2012-01-02.avro avro-test.pig > which: no hbase in > (:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin) > <snip> > avro: {id: long,timestamp: long,latitude: int,longitude: int,speed: > int,heading: int,terminalid: int,customerid: chararray,mileage: > int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id: > long,value: chararray,pkey: chararray)}} > > > $ pig -p INPUT="/data/2012/trace_ejb3/2012-01-0[12].avro" avro-test.pig > which: no hbase in > (:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin) > <snip> > 2012-11-24 14:11:17,309 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2999: Unexpected internal error. null > Caused by: java.net.URISyntaxException: Illegal character in path at index > 31: /data/2012/trace_ejb3/2012-01-0[12].avro > > > $ pig -p INPUT='/data/2012/trace_ejb3/2012-01-0[12].avro' avro-test.pig > which: no hbase in > (:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin) > <snip> > 2012-11-24 14:12:05,085 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2999: Unexpected internal error. null > Details at logfile: /var/lib/hadoop-hdfs/pig_1353762722742.log > Caused by: java.net.URISyntaxException: Illegal character in path at index > 31: /data/2012/trace_ejb3/2012-01-0[12].avro > > > Deepak Tiwari schreef op 24.11.2012 00:41: >> Hi, >> >> I dont have a system to test it right now, but I have been passing it using >> under parameter -p and it works. >> >> change line to accept parameters like avro = load '$INPUT' USING >> AvroStorage(); >> >> bin/pig -p INPUT="/data/2012/trace_ejb3/2012-**01-0[12].avro" <scriptName> >> >> I think if you dont give double quotes then the expansion is done by OS. >> >> Please let us know if it doesnt work... >> >> >> >> On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst <[email protected]> wrote: >> >>> Hello, >>> >>> I have the following files on HDFS: >>> >>> -rw-r--r-- 3 hdfs supergroup 22989179 2012-11-22 11:17 >>> /data/2012/trace_ejb3/2012-01-**01.avro >>> -rw-r--r-- 3 hdfs supergroup 240551819 2012-11-22 14:27 >>> /data/2012/trace_ejb3/2012-01-**02.avro >>> -rw-r--r-- 3 hdfs supergroup 324464635 2012-11-22 18:28 >>> /data/2012/trace_ejb3/2012-01-**03.avro >>> -rw-r--r-- 3 hdfs supergroup 345526418 2012-11-22 21:30 >>> /data/2012/trace_ejb3/2012-01-**04.avro >>> -rw-r--r-- 3 hdfs supergroup 351322916 2012-11-23 00:28 >>> /data/2012/trace_ejb3/2012-01-**05.avro >>> -rw-r--r-- 3 hdfs supergroup 325953043 2012-11-23 04:32 >>> /data/2012/trace_ejb3/2012-01-**06.avro >>> -rw-r--r-- 3 hdfs supergroup 107019156 2012-11-23 05:58 >>> /data/2012/trace_ejb3/2012-01-**07.avro >>> -rw-r--r-- 3 hdfs supergroup 46392850 2012-11-23 06:37 >>> /data/2012/trace_ejb3/2012-01-**08.avro >>> -rw-r--r-- 3 hdfs supergroup 361970930 2012-11-23 10:06 >>> /data/2012/trace_ejb3/2012-01-**09.avro >>> -rw-r--r-- 3 hdfs supergroup 398462505 2012-11-23 13:44 >>> /data/2012/trace_ejb3/2012-01-**10.avro >>> -rw-r--r-- 3 hdfs supergroup 400785976 2012-11-23 17:17 >>> /data/2012/trace_ejb3/2012-01-**11.avro >>> -rw-r--r-- 3 hdfs supergroup 400027565 2012-11-23 20:43 >>> /data/2012/trace_ejb3/2012-01-**12.avro >>> >>> Using Pig 0.10.0-cdh4.1.2, i try to load those files, and describe them. >>> >>> REGISTER 'hdfs:///lib/avro-1.7.2.jar'; >>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar'; >>> REGISTER 'hdfs:///lib/piggybank.jar'; >>> >>> DEFINE AvroStorage org.apache.pig.piggybank.**storage.avro.AvroStorage(); >>> >>> avro = load '/data/2012/trace_ejb3/2012-**01-01.avro' USING AvroStorage(); >>> >>> describe avro; >>> >>> >>> This works, same with 2012-01-02.avro. >>> >>> However, as soon as i want to include multiple files, no dice. >>> >>> avro = load '/data/2012/trace_ejb3/2012-**01-{01,02}.avro' USING >>> AvroStorage(); >>> gives me: >>> 2012-11-23 21:41:07,475 [main] ERROR org.apache.pig.tools.grunt.**Grunt - >>> ERROR 2999: Unexpected internal error. null >>> Caused by: java.net.URISyntaxException: Illegal character in path at index >>> 30: /data/2012/trace_ejb3/2012-01-**{01,02}.avro >>> >>> avro = load '/data/2012/trace_ejb3/2012-**01-*.avro' USING AvroStorage(); >>> gives me: >>> Schema for avro unknown. >>> >>> avro = load '/data/2012/trace_ejb3/2012-**01-0[12].avro' USING >>> AvroStorage(); >>> also gives me: >>> Caused by: java.net.URISyntaxException: Illegal character in path at index >>> 31: /data/2012/trace_ejb3/2012-01-**0[12].avro >>> >>> What am i doing wrong here? According to http://hadoop.apache.org/docs/** >>> r0.21.0/api/org/apache/hadoop/**fs/FileSystem.html#globStatus%** >>> 28org.apache.hadoop.fs.Path%29<http://hadoop.apache.org/docs/r0.21.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus%28org.apache.hadoop.fs.Path%29>, >>> this should all be acceptable input? >>> >>> Thanks in advance! >>> >>> Kind regards, >>> >>> Bart >>>
