Try using BinStorage instead of the text-based PigStorage

D

On Tue, Dec 28, 2010 at 2:08 PM, Jonathan Coveney <[email protected]>wrote:

> So, I made a dumb little python script that parses a pig script, see's what
> stores there are, and then uses pig's describe function to get the schema
> of
> the object being stored and then uses that info to make a new file that has
> the proper loader/schema. I felt this was useful because I found myself
> making intermediate stores, and then it being pretty difficult to make the
> proper loader if there were a lot of columns (especially remembering the
> type).
>
> However, it seems that the result from DESCRIBE is not adequate to do a
> load. For example, I have test.txt which is literally just random pairs of
> numbers
>
> ie
>
> 1 2
> 1 3
> 1 4
> 2 5
> 2 6
> 3 7
> 3 8
> 4 9
> 5 10
> 6 11
> 7 12
> 8 13
> 8 14
> 8 15
>
> and so on.
>
> I do this:
>
> t1 = LOAD 'test.txt' AS (n1:int, n2:int);
> t2 = GROUP t1 BY n1;
> t3 = GROUP t2 BY group;
>
> DESCRIBE t3;
> STORE t3 INTO 'output.txt';
>
> The query runs without a hitch, however, there is an issue
>
> This is what describe gives:
>
> t3: {group: int,t2: {group: int,t1: {n1: int,n2: int}}}
>
> However, this won't let you load the file...
>
> the output has form
> x{(y,{(a,b)}
>
> And I'm not really sure how to go about even creating a loader that would
> properly load it. Suffice it to say, it seems pretty complicated to store
> and then load anything that isn't a flat file...is this by design? Is there
> an easier way to go from the schema, as per describe, to the schema you'd
> use to load it?
>
> I'm curious what people do in practice. I could probably extend the script
> I
> made to go from describe schema -> loading schema (if the pig loader can
> load things that have brackets and all that?), but I want to know what the
> limitations are.
>
> As always, I apologize if there is an easy answer to this. Thanks.
>

Reply via email to