hi all, I'm knew to Hadoop. Found Pig very quick and easy to learn, made my own simple scripts and my first own UDF. it's a loader UDF based on the piggybank samples found on 0.5.0 folders, it basically load data based on a fixed pattern, specified with regex code. everything is fine when run with -x local mode, I can manipulate input and generate output as well, output is only one part-00000 file containing data in the format I desired them. when I try to run it on my 5 nodes hadoop cluster with -x mapred option (0.20.1, same for 0.18.3 and old 0.4.0 pig) I have a strange behaviour for my output folder... there are more than one part-* file, some of them are empty ... some others contain the data I found before on the local run but splitted in different files ...
first question is: why ? is it normal to have output splitted into different part-00000 in the cluster execution of a script ? if there's nothing to do with it I can always reassemble them into a unique file with a perl or bash script after the copytoLocal operation but it doesn't seem too nice for me :-( thanks in advance for any suggestion. Matteo
