Nice, it works, thanks.
On 5/17/10 1:49 PM, "Dmitriy Ryaboy" <[email protected]> wrote: > double-escape the slash. > > grunt> data1 = FOREACH data GENERATE id, name, (gender=='\\N'?'U':gender) AS > gender; > grunt> dump data1 > (7001L,Test0,U) > (7002L,Test1,U) > (7003L,Test3,U) > (7004L,Test4,U) > > > On Mon, May 17, 2010 at 1:44 PM, Syed Wasti <[email protected]> wrote: > >> Attached is the data file, just in case, below is the data and the script, >> this should give you all you want and to your last question, I am using >> Mac. >> >> 7001 Test0 0 1 \N >> 7002 Test1 0 1 \N >> 7003 Test3 1 2 \N >> 7004 Test4 2 2 \N >> >> grunt> data = LOAD 'data' USING PigStorage('\t') AS (id: long, name: >> chararray, active: int, has_city: int, gender: chararray); >> grunt> dump data; >> (7001L,Test0,0,1,\N) >> (7002L,Test1,0,1,\N) >> (7003L,Test3,1,2,\N) >> (7004L,Test4,2,2,\N) >> >> grunt> data1 = FOREACH data GENERATE id, name, (gender is null?'U':gender) >> AS gender; >> grunt> dump data1; >> (7001L,Test0,\N) >> (7002L,Test1,\N) >> (7003L,Test3,\N) >> (7004L,Test4,\N) >> >> grunt> data1 = FOREACH data GENERATE id, name, (gender=='\N'?'U':gender) AS >> gender; >> grunt> dump data1; >> (7001L,Test0,\N) >> (7002L,Test1,\N) >> (7003L,Test3,\N) >> (7004L,Test4,\N) >> >> >> >> On 5/17/10 1:20 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >> >>> There must be some noise in your input that is getting interpreted >>> differently by Hive and Pig. Loading a bunch of newlines does generate >>> nulls, so I am not sure what's happening there. Are you loading using >>> PigStorage? Default delimiters? Can you upload a sample file and script >> that >>> reproduces the problem somewhere? Are you running this on Windows with >> its >>> weird newline delimiters? >>> >>> grunt> cat tmp/nulltest >>> 1 >>> 2 >>> >>> 3 >>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num); >>> grunt> processed = foreach data generate (num is null OR num == 3 ? 'XXX' >> : >>> num) as num; >>> grunt> dump processed; >>> (1) >>> (2) >>> (XXX) >>> (XXX) >>> >>> -Dmitriy >>> >>> On Mon, May 17, 2010 at 12:44 PM, Syed Wasti <[email protected]> >> wrote: >>> >>>> Have tried both ways foo is null OR foo == '\n', doesn't work in pig. >>>> Why would null values be saved as \N in a file ? Is there a reason, is >> this >>>> hive or hadoop way which pig cant understand ? >>>> >>>> >>>> On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <[email protected]> wrote: >>>> >>>>> Arguably, that's a Hive bug. What does hive do if you *want* to have a >> \n >>>> as >>>>> a value? >>>>> >>>>> For your case, I think it's as simple as foreach rel generate ( foo is >>>> null >>>>> OR foo == '\n' ? 'U' : foo); >>>>> >>>>> -D >>>>> >>>>> On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <[email protected]> >>>> wrote: >>>>> >>>>>> Well Dmitriy, my bad, I was looking at the data through a hive query >> and >>>> it >>>>>> shows as NULL, but when I looked into the flat file all the NULL >> values >>>> are >>>>>> are seen as \N. >>>>>> Hive is able to understand \N as NULL but pig is not... How can I >>>> resolve >>>>>> this ? >>>>>> >>>>>> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >>>>>> >>>>>>> In that case, maybe it's the data, and what you think is null is >>>> actually >>>>>>> '\n' ? >>>>>>> >>>>>>> -D >>>>>>> >>>>>>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> Doing absolutely the same thing and I am using pig 6 too. >>>>>>>> Tried with the fake data on both local and mapreduce modes, works >>>> fine. >>>>>>>> But on my script against actual data in mapreduce mode, it fails to >> do >>>>>> the >>>>>>>> same thing, places \N instead of U. >>>>>>>> >>>>>>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as >>>> (num); >>>>>>>> grunt> dump rel1; >>>>>>>> (1) >>>>>>>> (2) >>>>>>>> (3) >>>>>>>> () >>>>>>>> (5) >>>>>>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num); >>>>>>>> grunt> dump find_null; >>>>>>>> (1) >>>>>>>> (2) >>>>>>>> (3) >>>>>>>> (U) >>>>>>>> (5) >>>>>>>> >>>>>>>> >>>>>>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >>>>>>>> >>>>>>>>> So what I am saying is, check that you are not inserting some weird >>>>>>>>> non-ascii quotes in your actual script. >>>>>>>>> I just ran this on Pig 6, it worked: >>>>>>>>> >>>>>>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num); >>>>>>>>> grunt> dump data; >>>>>>>>> (1) >>>>>>>>> (2) >>>>>>>>> () >>>>>>>>> (3) >>>>>>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num >>>> ); >>>>>>>>> grunt> dump find_nulls; >>>>>>>>> (1) >>>>>>>>> (2) >>>>>>>>> (U) >>>>>>>>> (3) >>>>>>>>> >>>>>>>>> I double-checked just in case, and it works in both local and >>>> mapreduce >>>>>>>>> modes. >>>>>>>>> >>>>>>>>> -Dmitriy >>>>>>>>> >>>>>>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <[email protected]> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite, >>>>>>>>>> SQL(U is within single quotes): NVL(city,U) city >>>>>>>>>> Pig(U is within single quotes): (city is null?U:city) AS city >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Syed, >>>>>>>>>>> The samples you pasted include all kinds of extraneous >> characters. >>>>>> Are >>>>>>>>>> you >>>>>>>>>>> sure your script is properly encoded? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <[email protected] >>> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the >>>>>>>> bincond >>>>>>>>>>>> operator, ³(city is null?'U': city) AS city², which is of >>>> chararray >>>>>>>>>> type, >>>>>>>>>>>> the result file shows Œ\N¹ instead of U. Any ideas ? >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >> >>
