Thanks all of you guys.
Best Regards, Jumping Qu ------ Don't tell me how many enemies we have, but where they are! (ADV:Perl -- It's like Java, only it lets you deliver on time and under budget.) On Thu, Mar 4, 2010 at 3:12 AM, zaki rahaman <[email protected]> wrote: > In this case, why wouldn't you simply use globbing in your load statements? > Somethign like > > baidu = LOAD 'input/path/*baidu*' AS (schema); > google = LOAD 'input/path/*google*' AS (schema); > > Store baidu INTO 'output/path/baidu_all'; > Store google INTO 'output/path/google_all'; > > On Wed, Mar 3, 2010 at 1:21 PM, Romain Rigaux <[email protected] > >wrote: > > > Actually I was using another loader and I just tried with PigStorage (Pig > > 0.6) and it seems to work too. > > > > If your input file has two columns this will have the expected schema and > > data: > > > > A = load 'file' USING MyLoader() AS (f1:chararray, > > f2:chararray, fileName:chararray); > > > > A: {f1: chararray,f2: chararray,filename: chararray} > > > > If you do "tuple.set(tuple.getLength() - 1, fileName)" your third column > > will be null. > > > > So in practice the loader loads the data "independently" and then "casts" > > it > > to the schema you provided. After yes, I don't say that it is a very > clean > > solution. > > > > Thanks, > > > > Romain > > > > 2010/3/2 Mridul Muralidharan <[email protected]> > > > > > > > > I am not sure if this will work as you expect. > > > Depending on which implementation of PigStorage you end up using, it > > > might exhibit different behavior. > > > > > > If I am not wrong, currently, for example, if you specify something > like > > : > > > > > > A = load 'file' USING MyLoader() AS (f1:chararray, f2:chararray, > > > fileName:chararray); > > > > > > > > > your code will end up generating a tuple of 4 fields - the fileName > > > always being 'null' and the actual filename you inserted through > > > MyLoader ending up being the 4th field (and so not 'seen' by pig - not > > > sure what happens if you do a join, etc with this tuple though ! > > > Essentially runtime is not consistent with script schema). > > > > > > > > > Note - this is an implementation specific behavior, which could > probably > > > have been fixed by implementation specific hack > > > "tuple.set(tuple.getLength() - 1, fileName)" [if you know fileName is > > > the last field expected]. > > > > > > As expected, it is brittle code. > > > > > > > > > From a while back, I remember facing issues with pig's implicit > > > conversion to/from bytearray, its implicit project which was > introduced, > > > insertion of null's to extend to schema specified (the above behavior), > > > etc. > > > So you would become dependent on the impl changes. > > > > > > > > > I dont think BinStorage and PigStorage have been written with > > > inheritance in mind ... > > > > > > > > > Regards, > > > Mridul > > > > > > > > > > > > > > > > > > On Wednesday 03 March 2010 12:28 AM, Romain Rigaux wrote: > > > > Hi, > > > > > > > > In Pig 0.6 you can extend the PigStorage and grab the name of the > file > > > with > > > > something like this: > > > > > > > > @Override > > > > public void bindTo(String fileName, BufferedPositionedInputStream > > is, > > > long > > > > offset, long end) > > > > throws IOException { > > > > super.bindTo(fileName, is, offset, end); > > > > > > > > this.fileName = fileName; // In your case match with a regexp > and > > > get > > > > the group with the name only (e.g. google, baidu) > > > > } > > > > > > > > @Override > > > > public Tuple getNext() throws IOException { > > > > Tuple next = super.getNext(); > > > > > > > > if (next != null) { > > > > next.append(fileName); > > > > } > > > > > > > > return next; > > > > } > > > > > > > > Then you can group on the name and split on it. > > > > > > > > Thanks, > > > > > > > > Romain > > > > > > > > On Mon, Mar 1, 2010 at 3:09 AM, Jumping<[email protected]> > wrote: > > > > > > > >> Hi, > > > >> Could pig recognize files name are importing ? If could, how to do ? > I > > > want > > > >> to combine them according filename. > > > >> > > > >> Exp: > > > >> google_2009_12_21.csv, google_2010_01_21.csv, google_2010_02_21.csv, > > > >> baidu_2009_11_22.csv, baidu_2010_01_01.csv, baidu_2010_02_03.csv, > .... > > > >> > > > >> Sort and combine by name, then output two files: google_all.csv, > > > >> baidu_all.csv in a pig script. > > > >> > > > >> > > > >> Best Regards, > > > >> Jumping Qu > > > >> > > > >> ------ > > > >> Don't tell me how many enemies we have, but where they are! > > > >> (ADV:Perl -- It's like Java, only it lets you deliver on time and > > under > > > >> budget.) > > > >> > > > > > > > > > > > > -- > Zaki Rahaman >
