Yep, each sink with a different prefix will work fine too. My suggestion was just meant to avoid collision - file prefixes are good enough for that.
-- Hari Shreedharan On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote: > Hari, > If each sink uses a different file prefix, what's the need to write to > multiple HDFS directories. > All our sinks write to the same HDFS directory and each uses a unique > file prefix, and it seems to work fine. > Also haven't found anything in flume code or HDFS APIs which suggest > that two sinks can't write to the same directory. > > Just curious. > thanks > > > On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan > <[email protected] (mailto:[email protected])> wrote: > > Also note that having multiple sinks often improves performance - though you > > should have each sink write to a different directory on HDFS. Since each > > sink really uses only on thread at a time to write, having multiple sinks > > allows multiple threads to write to HDFS. Also if you can spare additional > > disks on your Flume agent machine for file channel data directories, that > > will also improve performance. > > > > > > > > Hari > > > > -- > > Hari Shreedharan > > > > On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: > > > > Hi, > > > > Why not try increasing the batch size on the source and sink to 10,000? > > > > Brock > > > > On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani > > <[email protected] (mailto:[email protected])> wrote: > > > > > > I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. > > > > > > On 12/12/2012 03:35 PM, Jagadish Bihani wrote: > > > > > > Hi > > > > I am able to write maximum 1.5 MB/sec data to HDFS (without compression) > > using File Channel. Are there any recommendations to improve the > > performance? > > Has anybody achieved around 10 MB/sec with file channel ? If yes please > > share the > > configuration like (Hardware used, RAM allocated and batch sizes of > > source,sink and channels). > > > > Following are the configuration details : > > ======================== > > > > I am using a machine with reasonable hardware configuration: > > Quadcore 2.00 GHz processors and 4 GB RAM. > > > > Command line options passed to flume agent : > > -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote > > -XX:MaxDirectMemorySize=2g" > > > > Agent Configuration: > > ============= > > agent.sources = avro-collection-source spooler > > agent.channels = fileChannel > > agent.sinks = hdfsSink fileSink > > > > # For each one of the sources, the type is defined > > > > agent.sources.spooler.type = spooldir > > agent.sources.spooler.spoolDir =/root/test_data > > agent.sources.spooler.batchSize = 1000 > > agent.sources.spooler.channels = fileChannel > > > > # Each sink's type must be defined > > agent.sinks.hdfsSink.type = hdfs > > agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test > > > > agent.sinks.hdfsSink.hdfs.fileType =DataStream > > agent.sinks.hdfsSink.hdfs.rollSize=0 > > agent.sinks.hdfsSink.hdfs.rollCount=0 > > agent.sinks.hdfsSink.hdfs.batchSize=1000 > > agent.sinks.hdfsSink.hdfs.rollInterval=60 > > > > agent.sinks.hdfsSink.channel= fileChannel > > > > agent.channels.fileChannel.type=file > > agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 > > > > agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 > > > > Regards, > > Jagadish > > > > > > > > > > -- > > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > > > >
