I am not sure if there is a JIRA for this, but essentially pig has some 'reserved' delimiters which it uses for representing complex types (tuple, bag) - which, sometimes, clashes with user data/user delimiter. I dont think these reserved delimiters are customizable though and pig does not escape them iirc (pigstorage and binstorage both have this issue iirc - though binstorage does not break that often due to binary markers, etc used).
Regards, Mridul On Monday 01 March 2010 10:06 AM, prasenjit mukherjee wrote:
yep, that works!!! thanks. What is the plan ? To have separate delimiters for fields,bags ? Basically the content of my file should now be : a b c {(15,good),(24,total),(9,bad)} a b d {(2,bad),(6,good),(8,total)} -Prasen On Mon, Mar 1, 2010 at 2:23 AM, Mridul Muralidharan <[email protected]> wrote:Your schema is essentially : (string, string, string, bag). With bag containing tuples with schema (number, string). Based on this, the schema should be what you described second - namely : r1 = load '/tmp/prasen/foo1.txt' using PigStorage(',') AS (f1:chararray, f2:chararray,f3:chararray, B:{T1:(i1:int,s1:chararray)}); What is the error you get ? One possible suspicion for error I can think of (not validated !) is : because your delimiter is ',' and the internal pig delimites bag fields by ',' too - you are hitting errors there. Is this right ? Will it be possible for you to do something like : ( export IFS=','; cat /tmp/prasen/foo1.txt | while read -r f1 f2 f3 bag; do echo -e "$f1\t$f2\t$f3\t$bag"; done> /tmp/prasen/foo1.txt_new ) And try with /tmp/prasen/foo1.txt_new to see if it works with the schema above ? If it does work, then this is a bug with PigStorage trying to use ',' as delimiter. Regards, Mridul On Sunday 28 February 2010 05:40 PM, prasenjit mukherjee wrote:Here is my data file : a,b,c,{(15,good),(24,total),(9,bad)} a,b,d,{(2,bad),(6,good),(8,total)} I tried following combinations but neither of then work : r1 = load '/tmp/prasen/foo1.txt' using PigStorage(',') AS (f1:chararray, f2:chararray,f3:chararray, B: {T1:(i1:int,s1:chararray), T2:(i2:int,s2:chararray), T3:(i3:int,s3:chararray) } ); r1 = load '/tmp/prasen/foo1.txt' using PigStorage(',') AS (f1:chararray, f2:chararray,f3:chararray, B: {T1:(i1:int,s1:chararray)} ); Any help is greatly appreciated ?
