I'm having problems using Pig's STRSPLIT (on Amazon's cloud computing
environment).
I also noticed that STRSPLIT isn't documented in the Pig Latin
Reference Manual, so I found out about it through other sources of
information.

My problem is that in certain cases STRSPLIT returns null.  I have no
idea why.  Here is an acual session I ran to demonstrate the problem:



grunt> CAT s3://otg-nlandys/pig-tut/bin-proto-4;
Meta    1234567890      foo     34
Movement        1234567890      Rambetter       1/1     2/3
Movement        1234567890      Freddyman       10/1    10/2

grunt> A = LOAD 's3://otg-nlandys/pig-tut/bin-proto-4';
grunt> DUMP A;
(Meta,1234567890,foo,34)
(Movement,1234567890,Rambetter,1/1,2/3)
(Movement,1234567890,Freddyman,10/1,10/2)

grunt> MOVEMENT = FILTER A BY (chararray) $0 == 'Movement';
grunt> DUMP MOVEMENT;
(Movement,1234567890,Rambetter,1/1,2/3)
(Movement,1234567890,Freddyman,10/1,10/2)

grunt> TEST = FOREACH MOVEMENT GENERATE $3 AS startpos:chararray;
grunt> DUMP TEST;
(1/1)
(10/1)

grunt> POSA = FOREACH TEST GENERATE STRSPLIT(startpos,'/');
grunt> DUMP POSA;
()
()

_________________________________________________________________


grunt> CAT s3://otg-nlandys/pig-tut/bin-proto-5;
1/1
10/1

grunt> B = LOAD 's3://otg-nlandys/pig-tut/bin-proto-5' AS startpos:chararray;
grunt> DUMP B;
(1/1)
(10/1)

grunt> POSB = FOREACH B GENERATE STRSPLIT(startpos,'/');
grunt> DUMP POSB;
((1,1))
((10,1))


_________________________________________________________________


My question is why POSA is empty rows and POSB isn't empty rows, when
it seems that they should be identical.

I'm kind of new to Pig and realize that the problem might be a
shortcoming of UDF's and how Pig works with data of varying column
count, but would like an explanation.  Thanks.

Also one other minor bug with STRSPLIT that I noticed.  If your first
argument to STRSPLIT is bytearray instead of chararray, it will return
null.  So you have to explicitly cast bytearray to chararray for it to
work.  Seems that this could be automated in the language, no?

- Nerius

Reply via email to