Nice, it works, thanks.

On 5/17/10 1:49 PM, "Dmitriy Ryaboy" <[email protected]> wrote:

> double-escape the slash.
> 
> grunt> data1 = FOREACH data GENERATE id, name, (gender=='\\N'?'U':gender) AS
> gender;
> grunt> dump data1
> (7001L,Test0,U)
> (7002L,Test1,U)
> (7003L,Test3,U)
> (7004L,Test4,U)
> 
> 
> On Mon, May 17, 2010 at 1:44 PM, Syed Wasti <[email protected]> wrote:
> 
>> Attached is the data file, just in case, below is the data and the script,
>> this should give you all you want and to your last question, I am using
>> Mac.
>> 
>> 7001    Test0    0    1    \N
>> 7002    Test1    0    1    \N
>> 7003    Test3    1    2    \N
>> 7004    Test4    2    2    \N
>> 
>> grunt> data = LOAD 'data' USING PigStorage('\t') AS (id: long, name:
>> chararray, active: int, has_city: int, gender: chararray);
>> grunt> dump data;
>> (7001L,Test0,0,1,\N)
>> (7002L,Test1,0,1,\N)
>> (7003L,Test3,1,2,\N)
>> (7004L,Test4,2,2,\N)
>> 
>> grunt> data1 = FOREACH data GENERATE id, name, (gender is null?'U':gender)
>> AS gender;
>> grunt> dump data1;
>> (7001L,Test0,\N)
>> (7002L,Test1,\N)
>> (7003L,Test3,\N)
>> (7004L,Test4,\N)
>> 
>> grunt> data1 = FOREACH data GENERATE id, name, (gender=='\N'?'U':gender) AS
>> gender;
>> grunt> dump data1;
>> (7001L,Test0,\N)
>> (7002L,Test1,\N)
>> (7003L,Test3,\N)
>> (7004L,Test4,\N)
>> 
>> 
>> 
>> On 5/17/10 1:20 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
>> 
>>> There must be some noise in your input that is getting interpreted
>>> differently by Hive and Pig. Loading a bunch of newlines does generate
>>> nulls, so I am not sure what's happening there. Are you loading using
>>> PigStorage? Default delimiters? Can you upload a sample file and script
>> that
>>> reproduces the problem somewhere? Are you running this on Windows with
>> its
>>> weird newline delimiters?
>>> 
>>> grunt> cat tmp/nulltest
>>> 1
>>> 2
>>> 
>>> 3
>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num);
>>> grunt> processed = foreach data generate (num is null OR num == 3 ? 'XXX'
>> :
>>> num) as num;
>>> grunt> dump processed;
>>> (1)
>>> (2)
>>> (XXX)
>>> (XXX)
>>> 
>>> -Dmitriy
>>> 
>>> On Mon, May 17, 2010 at 12:44 PM, Syed Wasti <[email protected]>
>> wrote:
>>> 
>>>> Have tried both ways foo is null OR foo == '\n', doesn't work in pig.
>>>> Why would null values be saved as \N in a file ? Is there a reason, is
>> this
>>>> hive or hadoop way which pig cant understand ?
>>>> 
>>>> 
>>>> On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <[email protected]> wrote:
>>>> 
>>>>> Arguably, that's a Hive bug. What does hive do if you *want* to have a
>> \n
>>>> as
>>>>> a value?
>>>>> 
>>>>> For your case, I think it's as simple as foreach rel generate ( foo is
>>>> null
>>>>> OR foo == '\n' ? 'U' : foo);
>>>>> 
>>>>> -D
>>>>> 
>>>>> On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> Well Dmitriy, my bad, I was looking at the data through a hive query
>> and
>>>> it
>>>>>> shows as NULL, but when I looked into the flat file all the NULL
>> values
>>>> are
>>>>>> are seen as \N.
>>>>>> Hive is able to understand \N as NULL but pig is not... How can I
>>>> resolve
>>>>>> this ?
>>>>>> 
>>>>>> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
>>>>>> 
>>>>>>> In that case, maybe it's the data, and what you think is null is
>>>> actually
>>>>>>> '\n' ?
>>>>>>> 
>>>>>>> -D
>>>>>>> 
>>>>>>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>>> Doing absolutely the same thing and I am using pig 6 too.
>>>>>>>> Tried with the fake data on both local and mapreduce modes, works
>>>> fine.
>>>>>>>> But on my script against actual data in mapreduce mode, it fails to
>> do
>>>>>> the
>>>>>>>> same thing, places \N instead of U.
>>>>>>>> 
>>>>>>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as
>>>> (num);
>>>>>>>> grunt> dump rel1;
>>>>>>>> (1)
>>>>>>>> (2)
>>>>>>>> (3)
>>>>>>>> ()
>>>>>>>> (5)
>>>>>>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num);
>>>>>>>> grunt> dump find_null;
>>>>>>>> (1)
>>>>>>>> (2)
>>>>>>>> (3)
>>>>>>>> (U)
>>>>>>>> (5)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> So what I am saying is, check that you are not inserting some weird
>>>>>>>>> non-ascii quotes in your actual script.
>>>>>>>>> I just ran this on Pig 6, it worked:
>>>>>>>>> 
>>>>>>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num);
>>>>>>>>> grunt> dump data;
>>>>>>>>> (1)
>>>>>>>>> (2)
>>>>>>>>> ()
>>>>>>>>> (3)
>>>>>>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num
>>>> );
>>>>>>>>> grunt> dump find_nulls;
>>>>>>>>> (1)
>>>>>>>>> (2)
>>>>>>>>> (U)
>>>>>>>>> (3)
>>>>>>>>> 
>>>>>>>>> I double-checked just in case, and it works in both local and
>>>> mapreduce
>>>>>>>>> modes.
>>>>>>>>> 
>>>>>>>>> -Dmitriy
>>>>>>>>> 
>>>>>>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <[email protected]>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite,
>>>>>>>>>> SQL(U is within single quotes): NVL(city,U) city
>>>>>>>>>> Pig(U is within single quotes): (city is null?U:city) AS city
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Syed,
>>>>>>>>>>> The samples you pasted include all kinds of extraneous
>> characters.
>>>>>> Are
>>>>>>>>>> you
>>>>>>>>>>> sure your script is properly encoded?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <[email protected]
>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the
>>>>>>>> bincond
>>>>>>>>>>>> operator, ³(city is null?'U': city) AS city², which is of
>>>> chararray
>>>>>>>>>> type,
>>>>>>>>>>>> the result file shows Œ\N¹ instead of U.  Any ideas ?
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 


Reply via email to