That is a problem with using "," as the field delimiter.
PigStorage ends up splitting the whole record by the delimiter and the
second field is also getting split.
If you use some other delimiter for your data (eg,tab or ^A), it should
work fine.
Thanks,
Thejas
On 1/26/12 7:31 AM, Sandopolus wrote:
Hi there
I am trying to load in some data using the PigStorage with a schema. But i
can't seem to get the schema right and was hoping someone could point out
my mistake.
Here is the data being loaded in:
2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,{(customer,27651a7d-0871-49a6-8df4-90305f7e840b),(customerClient,b57f9d15-6de7-486b-9761-46246be4abfe),(clientBuild,7376807c-7448-4785-8e2c-49814f6ce2f9),(country,FR)}
Commands used:
A = LOAD 'testdata.txt' USING PigStorage(',') as (key:chararray,
columns:bag {column:tuple (name:chararray, value:chararray)});
DUMP A;
This results in the following warning and output:
2012-01-26 15:27:51,860 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
(2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,)
From the output it doesn't seem to be picking up bag structure, but if i
remove the schema it will dump the data out correctly.
Any help would be much appreciated.
Ta
Sandy