double-escape the slash. grunt> data1 = FOREACH data GENERATE id, name, (gender=='\\N'?'U':gender) AS gender; grunt> dump data1 (7001L,Test0,U) (7002L,Test1,U) (7003L,Test3,U) (7004L,Test4,U)
On Mon, May 17, 2010 at 1:44 PM, Syed Wasti <[email protected]> wrote: > Attached is the data file, just in case, below is the data and the script, > this should give you all you want and to your last question, I am using > Mac. > > 7001 Test0 0 1 \N > 7002 Test1 0 1 \N > 7003 Test3 1 2 \N > 7004 Test4 2 2 \N > > grunt> data = LOAD 'data' USING PigStorage('\t') AS (id: long, name: > chararray, active: int, has_city: int, gender: chararray); > grunt> dump data; > (7001L,Test0,0,1,\N) > (7002L,Test1,0,1,\N) > (7003L,Test3,1,2,\N) > (7004L,Test4,2,2,\N) > > grunt> data1 = FOREACH data GENERATE id, name, (gender is null?'U':gender) > AS gender; > grunt> dump data1; > (7001L,Test0,\N) > (7002L,Test1,\N) > (7003L,Test3,\N) > (7004L,Test4,\N) > > grunt> data1 = FOREACH data GENERATE id, name, (gender=='\N'?'U':gender) AS > gender; > grunt> dump data1; > (7001L,Test0,\N) > (7002L,Test1,\N) > (7003L,Test3,\N) > (7004L,Test4,\N) > > > > On 5/17/10 1:20 PM, "Dmitriy Ryaboy" <[email protected]> wrote: > > > There must be some noise in your input that is getting interpreted > > differently by Hive and Pig. Loading a bunch of newlines does generate > > nulls, so I am not sure what's happening there. Are you loading using > > PigStorage? Default delimiters? Can you upload a sample file and script > that > > reproduces the problem somewhere? Are you running this on Windows with > its > > weird newline delimiters? > > > > grunt> cat tmp/nulltest > > 1 > > 2 > > > > 3 > > grunt> data = load 'tmp/nulltest' using PigStorage() as (num); > > grunt> processed = foreach data generate (num is null OR num == 3 ? 'XXX' > : > > num) as num; > > grunt> dump processed; > > (1) > > (2) > > (XXX) > > (XXX) > > > > -Dmitriy > > > > On Mon, May 17, 2010 at 12:44 PM, Syed Wasti <[email protected]> > wrote: > > > >> Have tried both ways foo is null OR foo == '\n', doesn't work in pig. > >> Why would null values be saved as \N in a file ? Is there a reason, is > this > >> hive or hadoop way which pig cant understand ? > >> > >> > >> On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <[email protected]> wrote: > >> > >>> Arguably, that's a Hive bug. What does hive do if you *want* to have a > \n > >> as > >>> a value? > >>> > >>> For your case, I think it's as simple as foreach rel generate ( foo is > >> null > >>> OR foo == '\n' ? 'U' : foo); > >>> > >>> -D > >>> > >>> On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <[email protected]> > >> wrote: > >>> > >>>> Well Dmitriy, my bad, I was looking at the data through a hive query > and > >> it > >>>> shows as NULL, but when I looked into the flat file all the NULL > values > >> are > >>>> are seen as \N. > >>>> Hive is able to understand \N as NULL but pig is not... How can I > >> resolve > >>>> this ? > >>>> > >>>> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <[email protected]> wrote: > >>>> > >>>>> In that case, maybe it's the data, and what you think is null is > >> actually > >>>>> '\n' ? > >>>>> > >>>>> -D > >>>>> > >>>>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <[email protected]> > >> wrote: > >>>>> > >>>>>> Doing absolutely the same thing and I am using pig 6 too. > >>>>>> Tried with the fake data on both local and mapreduce modes, works > >> fine. > >>>>>> But on my script against actual data in mapreduce mode, it fails to > do > >>>> the > >>>>>> same thing, places \N instead of U. > >>>>>> > >>>>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as > >> (num); > >>>>>> grunt> dump rel1; > >>>>>> (1) > >>>>>> (2) > >>>>>> (3) > >>>>>> () > >>>>>> (5) > >>>>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num); > >>>>>> grunt> dump find_null; > >>>>>> (1) > >>>>>> (2) > >>>>>> (3) > >>>>>> (U) > >>>>>> (5) > >>>>>> > >>>>>> > >>>>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote: > >>>>>> > >>>>>>> So what I am saying is, check that you are not inserting some weird > >>>>>>> non-ascii quotes in your actual script. > >>>>>>> I just ran this on Pig 6, it worked: > >>>>>>> > >>>>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num); > >>>>>>> grunt> dump data; > >>>>>>> (1) > >>>>>>> (2) > >>>>>>> () > >>>>>>> (3) > >>>>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num > >> ); > >>>>>>> grunt> dump find_nulls; > >>>>>>> (1) > >>>>>>> (2) > >>>>>>> (U) > >>>>>>> (3) > >>>>>>> > >>>>>>> I double-checked just in case, and it works in both local and > >> mapreduce > >>>>>>> modes. > >>>>>>> > >>>>>>> -Dmitriy > >>>>>>> > >>>>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <[email protected]> > >>>> wrote: > >>>>>>> > >>>>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite, > >>>>>>>> SQL(U is within single quotes): NVL(city,U) city > >>>>>>>> Pig(U is within single quotes): (city is null?U:city) AS city > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <[email protected]> wrote: > >>>>>>>> > >>>>>>>>> Syed, > >>>>>>>>> The samples you pasted include all kinds of extraneous > characters. > >>>> Are > >>>>>>>> you > >>>>>>>>> sure your script is properly encoded? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <[email protected] > > > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the > >>>>>> bincond > >>>>>>>>>> operator, ³(city is null?'U': city) AS city², which is of > >> chararray > >>>>>>>> type, > >>>>>>>>>> the result file shows Œ\N¹ instead of U. Any ideas ? > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >>>> > >> > >> > >> > >
