Are you sure you want to directly write to hdfs from the app that is generating data ? often in production, apps like web servers etc do not have direct access to HDFS. i am not sure that HDFS sink guarantees 'either fully written successfully or failed totally without any partial file blocks written' since each transaction does not translate into a separate file. so i think there could be some partially written transactions in case of transaction abort.
This level of support for all-or-none at the file level is planned for what is currently referred to as the HCatalog sink https://issues.apache.org/jira/browse/FLUME-1734 -roshan On Tue, Apr 30, 2013 at 6:48 PM, Connor Woodson <[email protected]>wrote: > If you just want to write data to HDFS then Flume might not be the best > thing to use; however, there is a Flume Embedded > Agent<https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst#embedded-agent>that > will embed Flume into your application. I don't believe it works yet > with the HDFS sink, but some tinkering can likely make it work. > > - Connor > > > On Tue, Apr 30, 2013 at 11:00 AM, Chen Song <[email protected]>wrote: > >> I am looking at options in Java programs that can write files into HDFS >> with the following requirements. >> >> 1) Transaction Support: Each file, when being written, either fully >> written successfully or failed totally without any partial file blocks >> written. >> >> 2) Compression Support/File Formats: Can specify compression type or file >> format when writing contents. >> >> I know how to write data into a file on HDFS by opening a >> FSDataOutputStream shown >> here<http://stackoverflow.com/questions/13457934/writing-to-a-file-in-hdfs-in-hadoop>. >> Just wondering if there is some libraries of out of the box solutions that >> provides the support I mentioned above. >> >> I stumbled upon Flume, which provides HDFS sink that can support >> transaction, compression, file rotation, etc. But it doesn't seem to >> provide an API to be used as a library. The features Flume provides are >> highly coupled with the Flume architectural components, like source, >> channel, and sinks and doesn't seem to be usable independently. All I need >> is merely on the HDFS loading part. >> >> Does anyone have some good suggestions? >> >> -- >> Chen Song >> >> >
