Did you try to escape the backslash?
Dano On Thu, May 17, 2012 at 11:57 AM, Nerius Landys <[email protected]> wrote: > I'm having problems using Pig's STRSPLIT (on Amazon's cloud computing > environment). > I also noticed that STRSPLIT isn't documented in the Pig Latin > Reference Manual, so I found out about it through other sources of > information. > > My problem is that in certain cases STRSPLIT returns null. I have no > idea why. Here is an acual session I ran to demonstrate the problem: > > > > grunt> CAT s3://otg-nlandys/pig-tut/bin-proto-4; > Meta 1234567890 foo 34 > Movement 1234567890 Rambetter 1/1 2/3 > Movement 1234567890 Freddyman 10/1 10/2 > > grunt> A = LOAD 's3://otg-nlandys/pig-tut/bin-proto-4'; > grunt> DUMP A; > (Meta,1234567890,foo,34) > (Movement,1234567890,Rambetter,1/1,2/3) > (Movement,1234567890,Freddyman,10/1,10/2) > > grunt> MOVEMENT = FILTER A BY (chararray) $0 == 'Movement'; > grunt> DUMP MOVEMENT; > (Movement,1234567890,Rambetter,1/1,2/3) > (Movement,1234567890,Freddyman,10/1,10/2) > > grunt> TEST = FOREACH MOVEMENT GENERATE $3 AS startpos:chararray; > grunt> DUMP TEST; > (1/1) > (10/1) > > grunt> POSA = FOREACH TEST GENERATE STRSPLIT(startpos,'/'); > grunt> DUMP POSA; > () > () > > _________________________________________________________________ > > > grunt> CAT s3://otg-nlandys/pig-tut/bin-proto-5; > 1/1 > 10/1 > > grunt> B = LOAD 's3://otg-nlandys/pig-tut/bin-proto-5' AS > startpos:chararray; > grunt> DUMP B; > (1/1) > (10/1) > > grunt> POSB = FOREACH B GENERATE STRSPLIT(startpos,'/'); > grunt> DUMP POSB; > ((1,1)) > ((10,1)) > > > _________________________________________________________________ > > > My question is why POSA is empty rows and POSB isn't empty rows, when > it seems that they should be identical. > > I'm kind of new to Pig and realize that the problem might be a > shortcoming of UDF's and how Pig works with data of varying column > count, but would like an explanation. Thanks. > > Also one other minor bug with STRSPLIT that I noticed. If your first > argument to STRSPLIT is bytearray instead of chararray, it will return > null. So you have to explicitly cast bytearray to chararray for it to > work. Seems that this could be automated in the language, no? > > - Nerius >
