I think my problem might have to do with this bug: https://issues.apache.org/jira/browse/PIG-2462 As the code of the loader uses getWrappedSplit()
On Jan 9, 2012, at 2:24 PM, Yulia Tolskaya wrote: yep! On Jan 9, 2012, at 2:09 PM, Daniel Dai wrote: Did you set "pig.splitCombination" to false? On Mon, Jan 9, 2012 at 10:38 AM, Yulia Tolskaya <[email protected]<mailto:[email protected]>> wrote: Thank you for your response! I am trying to use the Loader you have suggested, and I keep running into problems. For some reason I keep getting the same file name for all files in the folder. I do not understand why this is happing! Yulia Yulia On Jan 9, 2012, at 1:57 AM, Daniel Dai wrote: Check https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%3F Daniel On Sun, Jan 8, 2012 at 10:45 PM, Yulia Tolskaya <[email protected]<mailto:[email protected]>> wrote: Hello, I am wondering if there is a way for me to load multiple files into pig, while still keeping track of what record came from what file. To give some background, I have about half a million files of one phrase per line, and I need to note which document each phrase belongs to. Thanks for your help! Yulia
