Hi, If you just did not bucket the data at all, it would be organized by the time they arrived at the sink.
Brock On Fri, Nov 2, 2012 at 6:08 PM, Pankaj Gupta <[email protected]> wrote: > Hi, > > Is it possible to organize files written to HDFS into buckets based on the > time of writing rather than the timestamp in the header? Alternatively, is > it possible to insert the timestamp injector just before the HDFS Sink? > > My use case is to organize files such that they are organized > chronologically as well as alphabetically by name and that there is only one > file being written to at a time. This will make it easier to look for newly > available data so that MapReduce jobs can process them. > > Thanks in Advance, > Pankaj > > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
