yes my mistake. I meant to FLATTEN it and then reference it directly. I'll look at filter. What I really need is something where I can filter rows that have UUID followed by either only \t (delims) or \n
On Thu, Mar 7, 2013 at 12:11 PM, Harsha <[email protected]> wrote: > Mohit, > A = LOAD '/user/apuser/test/data1' AS b:bag{ > you are naming your data bag as b. > if you want to refer values inside the data bag try b.a or b.b. > The sample data I gave you is something random if you are trying to skip > over nulls > you can do so by using Filter. > Take a look at http://pig.apache.org/docs/r0.11.0/ > -Harsha > > > -- > Harsha > > > On Thursday, March 7, 2013 at 11:58 AM, Mohit Anchlia wrote: > > > So I did this. I took your example and put it in a file and ran some pig > > commands through grunt but I am getting same results from a bag and > > generating from tuple. I might be doing something wrong here. > > > > grunt> A = LOAD '/user/apuser/test/data1' AS b:bag{t:tuple(a:chararray, > > b:chararray)}; > > grunt> dump A; > > 2013-03-07 14:55:25,125 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > input > > paths to process : 1 > > ({(1,)}) > > ({(3,)}) > > ({(5,10)}) > > ({(7,)}) > > > > grunt> b = foreach A generate b; > > grunt> dump b; > > 2013-03-07 14:57:59,509 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > input > > paths to process : 1 > > ({(1,)}) > > ({(3,)}) > > ({(5,10)}) > > ({(7,)}) > > grunt> > > > > I get the same output again. > > > > > > On Thu, Mar 7, 2013 at 11:40 AM, Mohit Anchlia > > <[email protected](mailto: > [email protected])>wrote: > > > > > good suggestion. Let me try that > > > > > > > > > On Thu, Mar 7, 2013 at 11:27 AM, Harsha <[email protected] (mailto: > [email protected])> wrote: > > > > > > > It will be easier if you have some sample data and run it through > grunt > > > > shell. > > > > Lets say you have a dataset like this > > > > ({(1,)}) > > > > ({(3,)}) > > > > ({(5,10)}) > > > > ({(7,)}) > > > > > > > > some of them are nulls in your "b" and some rows has values for "b" > > > > and if you do a "generate" for above it will run through each row > > > > and try to fetch values for b if there is none it will do () > > > > something like this > > > > > > > > ({()}) > > > > ({()}) > > > > ({(10)}) > > > > ({()}) > > > > > > > > > > > > > > > > > > > > -- > > > > Harsha > > > > > > > > > > > > On Thursday, March 7, 2013 at 11:15 AM, Mohit Anchlia wrote: > > > > > > > > > sorry, yes my question was about accessing b not $1. What's the > effect > > > > of > > > > > writing empty() to a file. Say if I did store b into temp then > should I > > > > > expect a line or nothing gets writen at all in the file. > > > > > > > > > > On Thu, Mar 7, 2013 at 10:53 AM, Harsha <[email protected] (mailto: > [email protected]) (mailto: > > > > [email protected] (mailto:[email protected]))> wrote: > > > > > > > > > > > from your schema b:bag{t:tuple(a:chararray, b:chararray)} > > > > > > your tuple is inside a bag so on the next line if you are trying > to > > > > > > > > > > > > > > > > > > > > > > > > access > > > > > > through $1 pig will > > > > > > throw up an error saying non-existent column. > > > > > > but if your question is about accessing b than it will print > empty () > > > > > > > > > > > > > > > > > > > if > > > > > > the there is no value present (as you are setting it as null). > > > > > > > > > > > > -- > > > > > > Harsha > > > > > > > > > > > > > > > > > > On Thursday, March 7, 2013 at 10:35 AM, Mohit Anchlia wrote: > > > > > > > > > > > > > Thanks! Does "generate" skip over that? if I did b = for B > generate > > > > $1 > > > > > > what > > > > > > > should be expected outcome of alias "b" > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 7, 2013 at 10:31 AM, Harsha <[email protected](mailto: > [email protected]) (mailto: > > > > [email protected] (mailto:[email protected])) (mailto: > > > > > > [email protected] (mailto:[email protected]))> wrote: > > > > > > > > > > > > > > > Hi Mohit, > > > > > > > > it won't convert into string literal 'NULL' since its a tuple > > > > > > > > you'll see results like > > > > > > > > ('Hello',) > > > > > > > > > > > > > > > > -- > > > > > > > > Harsha > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, March 7, 2013 at 10:10 AM, Mohit Anchlia wrote: > > > > > > > > > > > > > > > > > Any help would be appreciated. I'll also write something > > > > shortly and > > > > > > see > > > > > > > > > what happens. > > > > > > > > > > > > > > > > > > On Wed, Mar 6, 2013 at 4:58 PM, Mohit Anchlia < > > > > > > [email protected] (mailto:[email protected])(mailto: > > > > > > > > [email protected] (mailto:[email protected] > ))>wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If I define and set tuple like this: > > > > > > > > > > > > > > > > > > > > Tuple t1 = mTupleFactory.newTuple(2); > > > > > > > > > > t1.set(0, "Hello"); > > > > > > > > > > t1.set(1, NULL); > > > > > > > > > > > > > > > > > > > > and have schema like: > > > > > > > > > > > > > > > > > > > > b:bag{t:tuple(a:chararray, b:chararray) > > > > > > > > > > > > > > > > > > > > and then in the pig script if I do: > > > > > > > > > > > > > > > > > > > > page = foreach B generate b; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What should be expected outcome? Would "generate" convert > > > > NULL into > > > > > > > > > > literal 'NULL' as a string? Or does it skip over that > NULL. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
