Hi, Cheolsoo,

If we can allow string as index, then it should be backward compatible and also 
give us ability to separate schema without the need to track them.

Thanks.
Dan

-----Original Message-----
From: Cheolsoo Park [mailto:[email protected]] 
Sent: Tuesday, August 21, 2012 6:04 PM
To: [email protected]
Subject: Re: runtime exception when load and store multiple files using avro in 
pig

Hi Dan,

Glad to hear that it worked. I totally agree that AvroStorage can be improved. 
In fact, it was written for Pig 0.7, so it can be written much nicer now.

Only concern that I have is backward compatibility. That is, if I change syntax 
(I wanted so badly while working on AvroStorage recently), it will break 
backward compatibility. What I have been thinking is to rewrite AvroStorage in 
core Pig like HBaseStorage. For backward compatibility, we may keep the old 
version in Piggybank for a while and eventually retire it.

I am wondering what other people think. Please let me know if it is not a good 
idea to move AvroStorage to core Pig from Piggybank.

Thanks,
Cheolsoo

On Tue, Aug 21, 2012 at 5:47 PM, Danfeng Li <[email protected]> wrote:

> Thanks, Cheolsoo. That solve my problems.
>
> It will be nice if pig can do this automatically when there are 
> multiple avrostorage in the code. Otherwise, we have to manually track the 
> numbers.
>
> Dan
>
> -----Original Message-----
> From: Cheolsoo Park [mailto:[email protected]]
> Sent: Tuesday, August 21, 2012 5:06 PM
> To: [email protected]
> Subject: Re: runtime exception when load and store multiple files 
> using avro in pig
>
> Hi Danfeng,
>
> The "long" is from the 1st AvroStorage store in your script. The 
> AvroStorage has very funny syntax regarding multiple stores. To apply 
> different avro schemas to multiple stores, you have to specify their 
> "index" as follows:
>
> set1 = load 'input1.txt' using PigStorage('|') as ( ... ); *store set1 
> into 'set1' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage('index', '1');*
>
> set2 = load 'input2.txt' using PigStorage('|') as ( .. ); *store set2 
> into 'set2' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage('index',
> '2');*
>
> As can be seen, I added the 'index' parameters.
>
> What AvroStorage does is to construct the following string in the frontend:
>
> "1#<1st avro schema>,2#<2nd avro schema>"
>
> and pass it to backend via UdfContext. Now in backend, tasks parse 
> this string to get output schema for each store.
>
> Thanks,
> Cheolsoo
>
> On Tue, Aug 21, 2012 at 4:38 PM, Danfeng Li <[email protected]>
> wrote:
>
> > I run into this strange problem when try to load multiple text 
> > formatted files and convert them into avro format using pig. 
> > However, if I read and convert one file at a time in separated runs, 
> > everything is fine. The error message is following
> >
> > 2012-08-21 19:15:32,964 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to 
> > recreate exception from backed error:
> > org.apache.avro.file.DataFileWriter$AppendWriteException:
> > java.lang.RuntimeException: Datum 1980-01-01 00:00:00.000 is not in 
> > union ["null","long"]
> >                 at
> > org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
> >                 at
> >
> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvr
> oRecordWriter.java:49)
> >                 at
> >
> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.
> java:612)
> >                 at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutput
> Format$PigRecordWriter.write(PigOutputFormat.java:139)
> >                 at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutput
> Format$PigRecordWriter.write(PigOutputFormat.java:98)
> >                 at
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTas
> k.java:531)
> >                 at
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutp
> utContext.java:80)
> >                 at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnl
> y$Map.collect(PigMapOnly.java:48)
> >                 at
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGene
> > ri
> > cMapB
> >
> > my code is
> > set1 = load '$input_dir/set1.txt' using PigStorage('|') as (
> >    id:long,
> >    f1:long,
> >    f2:chararray,
> >    f3:float,
> >    f4:float,
> >    f5:float,
> >    f6:float,
> >    f7:float,
> >    f8:float,
> >    f9:float,
> >    f10:float,
> >    f11:float,
> >    f12:float);
> > store set1 into '$output_dir/set1.avro'
> > using org.apache.pig.piggybank.storage.avro.AvroStorage();
> >
> > set2 = load '$input_dir/set2.txt' using PigStorage('|') as (
> >    id : int,
> >    date : chararray);
> > store set2 into '$output_dir/set2.avro'
> > using org.apache.pig.piggybank.storage.avro.AvroStorage();
> >
> > The first file is converted fine, but the 2nd one is failed. The 
> > error is coming from the 2nd field in the 2nd file, but the strange 
> > thing is that I don't even have "long" in my schema while the error 
> > message is showing ["null","long"].
> >
> > I use pig 0.10.0 and avro-1.7.1.jar.
> >
> > I wonder if this is a bug or I missed something.
> >
> > Thanks.
> > Dan
> >
> > Here's set1.txt
> >
> > 827352|740214|Long|26|0.08731795012183759|1661335.541733333|0|0|0.00
> > 827352|740214|Long|26|10
> > 827352|740214|Long|26|57865808239878|0.001059541098077884|0.00105954
> > 827352|740214|Long|26|57865808239878|10
> > 827352|740214|Long|26|98077821|0.0514156486228232|0.0010439801817575
> > 827352|740214|Long|26|98077821|39
> >
> > 827353|740214|Short|12|-0.05967910581502997|-1135471.22271|0|0|-0.00
> > 827353|740214|Short|12|11
> > 827353|740214|Short|12|85620143839061|-0.001187497751909232|-0.00118
> > 827353|740214|Short|12|85620143839061|74
> > 827353|740214|Short|12|97751909183|-0.0747641932858414|-0.0001307449
> > 827353|740214|Short|12|97751909183|00
> > 827353|740214|Short|12|2148424
> >
> > 827354|740214|Total|38|0.02763884430680765|19026277.40819863|0|0|-0.
> > 827354|740214|Total|38|00
> > 827354|740214|Total|38|01277543355991829|-0.0001279566538313473|-0.0
> > 827354|740214|Total|38|01277543355991829|00
> > 827354|740214|Total|38|1279566538313626|-0.02334854466301821|0.00091
> > 827354|740214|Total|38|1279566538313626|32
> > 827354|740214|Total|38|352815426966
> >
> > 827193|739576|Long|26|0.08731795012183759|1661335.541733333|0|0|0.00
> > 827193|739576|Long|26|10
> > 827193|739576|Long|26|57865808239878|0.001059541098077884|0.00105954
> > 827193|739576|Long|26|57865808239878|10
> > 827193|739576|Long|26|98077821|0.0514156486228232|0.0010439801817575
> > 827193|739576|Long|26|98077821|39
> >
> > 827194|739576|Short|12|-0.05967910581502997|-1135471.22271|0|0|-0.00
> > 827194|739576|Short|12|11
> > 827194|739576|Short|12|85620143839061|-0.001187497751909232|-0.00118
> > 827194|739576|Short|12|85620143839061|74
> > 827194|739576|Short|12|97751909183|-0.0747641932858414|-0.0001307449
> > 827194|739576|Short|12|97751909183|00
> > 827194|739576|Short|12|2148424
> >
> > 827195|739576|Total|38|0.02763884430680765|19026277.40819863|0|0|-0.
> > 827195|739576|Total|38|00
> > 827195|739576|Total|38|01277543355991829|-0.0001279566538313473|-0.0
> > 827195|739576|Total|38|01277543355991829|00
> > 827195|739576|Total|38|1279566538313626|-0.02334854466301821|0.00091
> > 827195|739576|Total|38|1279566538313626|32
> > 827195|739576|Total|38|352815426966
> >
> > 827355|740215|Long|51|1.776868012839072|113652088.7063555|0|0|0.0195
> > 827355|740215|Long|51|25
> > 827355|740215|Long|51|47658695701|0.0195703176808393|0.0195703176808
> > 827355|740215|Long|51|47658695701|39
> > 827355|740215|Long|51|28|1.164818333642054|0
> >
> > 827356|740215|Short|34|-2.360589090333165|-150988074.9471841|0|0|-0.
> > 827356|740215|Short|34|00
> > 827356|740215|Short|34|868330219442376|-0.008616238065508337|-0.0086
> > 827356|740215|Short|34|868330219442376|16
> > 827356|740215|Short|34|238065508375|-0.5943698959308671|-0.026906792
> > 827356|740215|Short|34|238065508375|30
> > 827356|740215|Short|34|502523
> >
> > 827357|740215|Total|85|-0.5837210774940929|63962032.00527128|0|0|0.0
> > 827357|740215|Total|85|10
> > 827357|740215|Total|85|84217439253325|0.01095407961533095|0.01095407
> > 827357|740215|Total|85|84217439253325|96
> > 827357|740215|Total|85|153309|0.5704484377111866|-0.0269067923050252
> > 827357|740215|Total|85|153309|3
> >
> > 827202|739590|Long|53|1.777568428360522|113696888.7063555|0|0|0.0195
> > 827202|739590|Long|53|25
> > 827202|739590|Long|53|47658695701|0.0195703176808393|0.0195703176808
> > 827202|739590|Long|53|47658695701|39
> > 827202|739590|Long|53|28|1.156653489849146|0
> >
> > Here's the set2.txt
> > 1|1980-01-01 00:00:00.000
> > 2|1980-01-02 00:00:00.000
> > 3|1980-01-03 00:00:00.000
> > 4|1980-01-04 00:00:00.000
> > 5|1980-01-07 00:00:00.000
> > 6|1980-01-08 00:00:00.000
> > 7|1980-01-09 00:00:00.000
> > 8|1980-01-10 00:00:00.000
> > 9|1980-01-11 00:00:00.000
> > 10|1980-01-14 00:00:00.000
> >
> >
>

Reply via email to