Thanks. Is there any particular downside to this if you get to the millions and 
hundreds of millions of rows, or is it just the lack of simple use with nonpig 
systems?

Sent via BlackBerry

-----Original Message-----
From: Dmitriy Ryaboy <[email protected]>
Date: Tue, 28 Dec 2010 15:08:15 
To: <[email protected]>
Reply-To: [email protected]
Subject: Re: Possible deficiency in describe?

Try using BinStorage instead of the text-based PigStorage

D

On Tue, Dec 28, 2010 at 2:08 PM, Jonathan Coveney <[email protected]>wrote:

> So, I made a dumb little python script that parses a pig script, see's what
> stores there are, and then uses pig's describe function to get the schema
> of
> the object being stored and then uses that info to make a new file that has
> the proper loader/schema. I felt this was useful because I found myself
> making intermediate stores, and then it being pretty difficult to make the
> proper loader if there were a lot of columns (especially remembering the
> type).
>
> However, it seems that the result from DESCRIBE is not adequate to do a
> load. For example, I have test.txt which is literally just random pairs of
> numbers
>
> ie
>
> 1 2
> 1 3
> 1 4
> 2 5
> 2 6
> 3 7
> 3 8
> 4 9
> 5 10
> 6 11
> 7 12
> 8 13
> 8 14
> 8 15
>
> and so on.
>
> I do this:
>
> t1 = LOAD 'test.txt' AS (n1:int, n2:int);
> t2 = GROUP t1 BY n1;
> t3 = GROUP t2 BY group;
>
> DESCRIBE t3;
> STORE t3 INTO 'output.txt';
>
> The query runs without a hitch, however, there is an issue
>
> This is what describe gives:
>
> t3: {group: int,t2: {group: int,t1: {n1: int,n2: int}}}
>
> However, this won't let you load the file...
>
> the output has form
> x{(y,{(a,b)}
>
> And I'm not really sure how to go about even creating a loader that would
> properly load it. Suffice it to say, it seems pretty complicated to store
> and then load anything that isn't a flat file...is this by design? Is there
> an easier way to go from the schema, as per describe, to the schema you'd
> use to load it?
>
> I'm curious what people do in practice. I could probably extend the script
> I
> made to go from describe schema -> loading schema (if the pig loader can
> load things that have brackets and all that?), but I want to know what the
> limitations are.
>
> As always, I apologize if there is an easy answer to this. Thanks.
>

Reply via email to