IIRC you can do this, but MR had some issues if you passed it a non-closed (but sync'd upon) file for splitting.
However, if you run into similar issues, try generating your own splits over the big file via FileInputFormat#getSplits(…), which will then work. On Thu, Nov 1, 2012 at 4:50 AM, Pankaj Gupta <[email protected]> wrote: > Hi, > > Is it possible to run a MapReduce job on a part of file on HDFS? The use case > is using a single file on HDFS as a stream to store all log events of a > particular kind. New data can grow on top while Map Reduce can process old > data. Of course one option would be to copy part of data into a separate file > and give that to MapReduce but I was wondering if that extra copy can be > avoided. > > Thanks, > Pankaj -- Harsh J
