double-escape the slash.

grunt> data1 = FOREACH data GENERATE id, name, (gender=='\\N'?'U':gender) AS
gender;
grunt> dump data1
(7001L,Test0,U)
(7002L,Test1,U)
(7003L,Test3,U)
(7004L,Test4,U)


On Mon, May 17, 2010 at 1:44 PM, Syed Wasti <[email protected]> wrote:

> Attached is the data file, just in case, below is the data and the script,
> this should give you all you want and to your last question, I am using
> Mac.
>
> 7001    Test0    0    1    \N
> 7002    Test1    0    1    \N
> 7003    Test3    1    2    \N
> 7004    Test4    2    2    \N
>
> grunt> data = LOAD 'data' USING PigStorage('\t') AS (id: long, name:
> chararray, active: int, has_city: int, gender: chararray);
> grunt> dump data;
> (7001L,Test0,0,1,\N)
> (7002L,Test1,0,1,\N)
> (7003L,Test3,1,2,\N)
> (7004L,Test4,2,2,\N)
>
> grunt> data1 = FOREACH data GENERATE id, name, (gender is null?'U':gender)
> AS gender;
> grunt> dump data1;
> (7001L,Test0,\N)
> (7002L,Test1,\N)
> (7003L,Test3,\N)
> (7004L,Test4,\N)
>
> grunt> data1 = FOREACH data GENERATE id, name, (gender=='\N'?'U':gender) AS
> gender;
> grunt> dump data1;
> (7001L,Test0,\N)
> (7002L,Test1,\N)
> (7003L,Test3,\N)
> (7004L,Test4,\N)
>
>
>
> On 5/17/10 1:20 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
>
> > There must be some noise in your input that is getting interpreted
> > differently by Hive and Pig. Loading a bunch of newlines does generate
> > nulls, so I am not sure what's happening there. Are you loading using
> > PigStorage? Default delimiters? Can you upload a sample file and script
> that
> > reproduces the problem somewhere? Are you running this on Windows with
> its
> > weird newline delimiters?
> >
> > grunt> cat tmp/nulltest
> > 1
> > 2
> >
> > 3
> > grunt> data = load 'tmp/nulltest' using PigStorage() as (num);
> > grunt> processed = foreach data generate (num is null OR num == 3 ? 'XXX'
> :
> > num) as num;
> > grunt> dump processed;
> > (1)
> > (2)
> > (XXX)
> > (XXX)
> >
> > -Dmitriy
> >
> > On Mon, May 17, 2010 at 12:44 PM, Syed Wasti <[email protected]>
> wrote:
> >
> >> Have tried both ways foo is null OR foo == '\n', doesn't work in pig.
> >> Why would null values be saved as \N in a file ? Is there a reason, is
> this
> >> hive or hadoop way which pig cant understand ?
> >>
> >>
> >> On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <[email protected]> wrote:
> >>
> >>> Arguably, that's a Hive bug. What does hive do if you *want* to have a
> \n
> >> as
> >>> a value?
> >>>
> >>> For your case, I think it's as simple as foreach rel generate ( foo is
> >> null
> >>> OR foo == '\n' ? 'U' : foo);
> >>>
> >>> -D
> >>>
> >>> On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <[email protected]>
> >> wrote:
> >>>
> >>>> Well Dmitriy, my bad, I was looking at the data through a hive query
> and
> >> it
> >>>> shows as NULL, but when I looked into the flat file all the NULL
> values
> >> are
> >>>> are seen as \N.
> >>>> Hive is able to understand \N as NULL but pig is not... How can I
> >> resolve
> >>>> this ?
> >>>>
> >>>> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
> >>>>
> >>>>> In that case, maybe it's the data, and what you think is null is
> >> actually
> >>>>> '\n' ?
> >>>>>
> >>>>> -D
> >>>>>
> >>>>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> Doing absolutely the same thing and I am using pig 6 too.
> >>>>>> Tried with the fake data on both local and mapreduce modes, works
> >> fine.
> >>>>>> But on my script against actual data in mapreduce mode, it fails to
> do
> >>>> the
> >>>>>> same thing, places \N instead of U.
> >>>>>>
> >>>>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as
> >> (num);
> >>>>>> grunt> dump rel1;
> >>>>>> (1)
> >>>>>> (2)
> >>>>>> (3)
> >>>>>> ()
> >>>>>> (5)
> >>>>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num);
> >>>>>> grunt> dump find_null;
> >>>>>> (1)
> >>>>>> (2)
> >>>>>> (3)
> >>>>>> (U)
> >>>>>> (5)
> >>>>>>
> >>>>>>
> >>>>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
> >>>>>>
> >>>>>>> So what I am saying is, check that you are not inserting some weird
> >>>>>>> non-ascii quotes in your actual script.
> >>>>>>> I just ran this on Pig 6, it worked:
> >>>>>>>
> >>>>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num);
> >>>>>>> grunt> dump data;
> >>>>>>> (1)
> >>>>>>> (2)
> >>>>>>> ()
> >>>>>>> (3)
> >>>>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num
> >> );
> >>>>>>> grunt> dump find_nulls;
> >>>>>>> (1)
> >>>>>>> (2)
> >>>>>>> (U)
> >>>>>>> (3)
> >>>>>>>
> >>>>>>> I double-checked just in case, and it works in both local and
> >> mapreduce
> >>>>>>> modes.
> >>>>>>>
> >>>>>>> -Dmitriy
> >>>>>>>
> >>>>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <[email protected]>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite,
> >>>>>>>> SQL(U is within single quotes): NVL(city,U) city
> >>>>>>>> Pig(U is within single quotes): (city is null?U:city) AS city
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Syed,
> >>>>>>>>> The samples you pasted include all kinds of extraneous
> characters.
> >>>> Are
> >>>>>>>> you
> >>>>>>>>> sure your script is properly encoded?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <[email protected]
> >
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the
> >>>>>> bincond
> >>>>>>>>>> operator, ³(city is null?'U': city) AS city², which is of
> >> chararray
> >>>>>>>> type,
> >>>>>>>>>> the result file shows Œ\N¹ instead of U.  Any ideas ?
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >>
>
>

Reply via email to