Try using BinStorage instead of the text-based PigStorage D
On Tue, Dec 28, 2010 at 2:08 PM, Jonathan Coveney <[email protected]>wrote: > So, I made a dumb little python script that parses a pig script, see's what > stores there are, and then uses pig's describe function to get the schema > of > the object being stored and then uses that info to make a new file that has > the proper loader/schema. I felt this was useful because I found myself > making intermediate stores, and then it being pretty difficult to make the > proper loader if there were a lot of columns (especially remembering the > type). > > However, it seems that the result from DESCRIBE is not adequate to do a > load. For example, I have test.txt which is literally just random pairs of > numbers > > ie > > 1 2 > 1 3 > 1 4 > 2 5 > 2 6 > 3 7 > 3 8 > 4 9 > 5 10 > 6 11 > 7 12 > 8 13 > 8 14 > 8 15 > > and so on. > > I do this: > > t1 = LOAD 'test.txt' AS (n1:int, n2:int); > t2 = GROUP t1 BY n1; > t3 = GROUP t2 BY group; > > DESCRIBE t3; > STORE t3 INTO 'output.txt'; > > The query runs without a hitch, however, there is an issue > > This is what describe gives: > > t3: {group: int,t2: {group: int,t1: {n1: int,n2: int}}} > > However, this won't let you load the file... > > the output has form > x{(y,{(a,b)} > > And I'm not really sure how to go about even creating a loader that would > properly load it. Suffice it to say, it seems pretty complicated to store > and then load anything that isn't a flat file...is this by design? Is there > an easier way to go from the schema, as per describe, to the schema you'd > use to load it? > > I'm curious what people do in practice. I could probably extend the script > I > made to go from describe schema -> loading schema (if the pig loader can > load things that have brackets and all that?), but I want to know what the > limitations are. > > As always, I apologize if there is an easy answer to this. Thanks. >
