hi all,

I'm knew to Hadoop. Found Pig very quick and easy to learn, made my own
simple scripts and my first own UDF.
it's a loader UDF based on the piggybank samples found on 0.5.0 folders, it
basically load data based on a fixed pattern, specified with regex code.
everything is fine when run with -x local mode, I can manipulate input and
generate output as well, output is only one part-00000 file containing data
in the format I desired them.
when I try to run it on my 5 nodes hadoop cluster with -x mapred option
(0.20.1, same for 0.18.3 and old 0.4.0 pig) I have a strange behaviour for
my output folder...
there are more than one part-* file, some of them are empty ... some others
contain the data I found before on the local run but splitted in different
files ...

first question is: why ? is it normal to have output splitted into different
part-00000 in the cluster execution of a script ?

if there's nothing to do with it I can always reassemble them into a unique
file with a perl or bash script after the copytoLocal operation but it
doesn't seem too nice for me :-(

thanks in advance for any suggestion.

Matteo

Reply via email to