Have tried both ways foo is null OR foo == '\n', doesn't work in pig. Why would null values be saved as \N in a file ? Is there a reason, is this hive or hadoop way which pig cant understand ?
On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <[email protected]> wrote: > Arguably, that's a Hive bug. What does hive do if you *want* to have a \n as > a value? > > For your case, I think it's as simple as foreach rel generate ( foo is null > OR foo == '\n' ? 'U' : foo); > > -D > > On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <[email protected]> wrote: > >> Well Dmitriy, my bad, I was looking at the data through a hive query and it >> shows as NULL, but when I looked into the flat file all the NULL values are >> are seen as \N. >> Hive is able to understand \N as NULL but pig is not... How can I resolve >> this ? >> >> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >> >>> In that case, maybe it's the data, and what you think is null is actually >>> '\n' ? >>> >>> -D >>> >>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <[email protected]> wrote: >>> >>>> Doing absolutely the same thing and I am using pig 6 too. >>>> Tried with the fake data on both local and mapreduce modes, works fine. >>>> But on my script against actual data in mapreduce mode, it fails to do >> the >>>> same thing, places \N instead of U. >>>> >>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as (num); >>>> grunt> dump rel1; >>>> (1) >>>> (2) >>>> (3) >>>> () >>>> (5) >>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num); >>>> grunt> dump find_null; >>>> (1) >>>> (2) >>>> (3) >>>> (U) >>>> (5) >>>> >>>> >>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >>>> >>>>> So what I am saying is, check that you are not inserting some weird >>>>> non-ascii quotes in your actual script. >>>>> I just ran this on Pig 6, it worked: >>>>> >>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num); >>>>> grunt> dump data; >>>>> (1) >>>>> (2) >>>>> () >>>>> (3) >>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num ); >>>>> grunt> dump find_nulls; >>>>> (1) >>>>> (2) >>>>> (U) >>>>> (3) >>>>> >>>>> I double-checked just in case, and it works in both local and mapreduce >>>>> modes. >>>>> >>>>> -Dmitriy >>>>> >>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <[email protected]> >> wrote: >>>>> >>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite, >>>>>> SQL(U is within single quotes): NVL(city,U) city >>>>>> Pig(U is within single quotes): (city is null?U:city) AS city >>>>>> >>>>>> >>>>>> >>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <[email protected]> wrote: >>>>>> >>>>>>> Syed, >>>>>>> The samples you pasted include all kinds of extraneous characters. >> Are >>>>>> you >>>>>>> sure your script is properly encoded? >>>>>>> >>>>>>> >>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the >>>> bincond >>>>>>>> operator, ³(city is null?'U': city) AS city², which is of chararray >>>>>> type, >>>>>>>> the result file shows Œ\N¹ instead of U. Any ideas ? >>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >> >> >>
