I suspect the problem is AvroStorage, not globbing. Try this with pigstorage.

Russell Jurney twitter.com/rjurney


On Nov 24, 2012, at 5:15 AM, Bart Verwilst <[email protected]> wrote:

> Hello,
>
> Thanks for your suggestion!
> I switch my avro variable to avro = load '$INPUT' USING AvroStorage();
>
> However I get the same results this way:
>
> $ pig -p INPUT=/data/2012/trace_ejb3/2012-01-02.avro avro-test.pig
> which: no hbase in 
> (:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin)
> <snip>
> avro: {id: long,timestamp: long,latitude: int,longitude: int,speed: 
> int,heading: int,terminalid: int,customerid: chararray,mileage: 
> int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id: 
> long,value: chararray,pkey: chararray)}}
>
>
> $ pig -p INPUT="/data/2012/trace_ejb3/2012-01-0[12].avro" avro-test.pig
> which: no hbase in 
> (:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin)
> <snip>
> 2012-11-24 14:11:17,309 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. null
> Caused by: java.net.URISyntaxException: Illegal character in path at index 
> 31: /data/2012/trace_ejb3/2012-01-0[12].avro
>
>
> $ pig -p INPUT='/data/2012/trace_ejb3/2012-01-0[12].avro' avro-test.pig
> which: no hbase in 
> (:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin)
> <snip>
> 2012-11-24 14:12:05,085 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. null
> Details at logfile: /var/lib/hadoop-hdfs/pig_1353762722742.log
> Caused by: java.net.URISyntaxException: Illegal character in path at index 
> 31: /data/2012/trace_ejb3/2012-01-0[12].avro
>
>
> Deepak Tiwari schreef op 24.11.2012 00:41:
>> Hi,
>>
>> I dont have a system to test it right now, but I have been passing it using
>> under parameter -p and it works.
>>
>> change line to  accept parameters like         avro = load '$INPUT' USING
>> AvroStorage();
>>
>> bin/pig -p INPUT="/data/2012/trace_ejb3/2012-**01-0[12].avro" <scriptName>
>>
>> I think if you dont give double quotes then the expansion is done by OS.
>>
>> Please let us know if it doesnt work...
>>
>>
>>
>> On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> I have the following files on HDFS:
>>>
>>> -rw-r--r--   3 hdfs supergroup   22989179 2012-11-22 11:17
>>> /data/2012/trace_ejb3/2012-01-**01.avro
>>> -rw-r--r--   3 hdfs supergroup  240551819 2012-11-22 14:27
>>> /data/2012/trace_ejb3/2012-01-**02.avro
>>> -rw-r--r--   3 hdfs supergroup  324464635 2012-11-22 18:28
>>> /data/2012/trace_ejb3/2012-01-**03.avro
>>> -rw-r--r--   3 hdfs supergroup  345526418 2012-11-22 21:30
>>> /data/2012/trace_ejb3/2012-01-**04.avro
>>> -rw-r--r--   3 hdfs supergroup  351322916 2012-11-23 00:28
>>> /data/2012/trace_ejb3/2012-01-**05.avro
>>> -rw-r--r--   3 hdfs supergroup  325953043 2012-11-23 04:32
>>> /data/2012/trace_ejb3/2012-01-**06.avro
>>> -rw-r--r--   3 hdfs supergroup  107019156 2012-11-23 05:58
>>> /data/2012/trace_ejb3/2012-01-**07.avro
>>> -rw-r--r--   3 hdfs supergroup   46392850 2012-11-23 06:37
>>> /data/2012/trace_ejb3/2012-01-**08.avro
>>> -rw-r--r--   3 hdfs supergroup  361970930 2012-11-23 10:06
>>> /data/2012/trace_ejb3/2012-01-**09.avro
>>> -rw-r--r--   3 hdfs supergroup  398462505 2012-11-23 13:44
>>> /data/2012/trace_ejb3/2012-01-**10.avro
>>> -rw-r--r--   3 hdfs supergroup  400785976 2012-11-23 17:17
>>> /data/2012/trace_ejb3/2012-01-**11.avro
>>> -rw-r--r--   3 hdfs supergroup  400027565 2012-11-23 20:43
>>> /data/2012/trace_ejb3/2012-01-**12.avro
>>>
>>> Using Pig 0.10.0-cdh4.1.2, i try to load those files, and describe them.
>>>
>>> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
>>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
>>> REGISTER 'hdfs:///lib/piggybank.jar';
>>>
>>> DEFINE AvroStorage org.apache.pig.piggybank.**storage.avro.AvroStorage();
>>>
>>> avro = load '/data/2012/trace_ejb3/2012-**01-01.avro' USING AvroStorage();
>>>
>>> describe avro;
>>>
>>>
>>> This works, same with 2012-01-02.avro.
>>>
>>> However, as soon as i want to include multiple files, no dice.
>>>
>>> avro = load '/data/2012/trace_ejb3/2012-**01-{01,02}.avro' USING
>>> AvroStorage();
>>> gives me:
>>> 2012-11-23 21:41:07,475 [main] ERROR org.apache.pig.tools.grunt.**Grunt -
>>> ERROR 2999: Unexpected internal error. null
>>> Caused by: java.net.URISyntaxException: Illegal character in path at index
>>> 30: /data/2012/trace_ejb3/2012-01-**{01,02}.avro
>>>
>>> avro = load '/data/2012/trace_ejb3/2012-**01-*.avro' USING AvroStorage();
>>> gives me:
>>> Schema for avro unknown.
>>>
>>> avro = load '/data/2012/trace_ejb3/2012-**01-0[12].avro' USING
>>> AvroStorage();
>>> also gives me:
>>> Caused by: java.net.URISyntaxException: Illegal character in path at index
>>> 31: /data/2012/trace_ejb3/2012-01-**0[12].avro
>>>
>>> What am i doing wrong here? According to http://hadoop.apache.org/docs/**
>>> r0.21.0/api/org/apache/hadoop/**fs/FileSystem.html#globStatus%**
>>> 28org.apache.hadoop.fs.Path%29<http://hadoop.apache.org/docs/r0.21.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus%28org.apache.hadoop.fs.Path%29>,
>>>  this should all be acceptable input?
>>>
>>> Thanks in advance!
>>>
>>> Kind regards,
>>>
>>> Bart
>>>

Reply via email to