Hi All,

I am using the “avro_event” serializer  with writable format as DataStream file 
type to store the events into hdfs.
I would like to read the file for further analysis. I am new to avro and don’t 
have idea; how to develop the de-serializer to read the flume’s events written 
in hdfs file.

If anyone could share the sample or example, it would be nice to me. Please 
help….

Thanks & Regards,
Ashutosh Sharma

From: Bhaskar V. Karambelkar [mailto:[email protected]]
Sent: Wednesday, August 22, 2012 12:22 AM
To: [email protected]
Subject: Re: Can HDFSSink write headers as well?


On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー 
<[email protected]<mailto:[email protected]>> wrote:
Hi David,

Currently there is no way to write headers to HDFS using the built-in Flume 
functionality.

This is not entirely true, the following combination will write headers to 
HDFS, in an avro_data file format (binary).

agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
agent.sinks.hdfsBinarySink.serializer =  avro_client
agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable

The serializer used is part of flume distribution viz.
flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java

A file thus written can be processed with AVRO mapreduce API found in AVRO 
distribution.

Also note that simply using DataStream doesn't mean it's a text file, the 
serializer and hdfs.writeFormat also decide
whether the file is text or binary.

I've read the entire HDFS sink code and exprimented with it a lot, so if you 
want more details, let me know.



If you are writing to text or binary files on HDFS (i.e. you have set 
hdfs.fileType = DataStream or CompressedStream in your config), then you can 
supply your own custom serializer, which will allow you to write headers to 
HDFS. You will need to write a serializer that implements 
org.apache.flume.serialization.EventSerializer.

If, on the other hand, you are writing to HDFS SequenceFiles, then 
unfortunately there is no way to customize the way that events are serialized, 
so you cannot write event headers to HDFS. This is a known issue (FLUME-1100) 
and I have supplied a patch to fix it.

Chris.



On 2012/08/21 11:36, David Capwell wrote:
I was wondering if I pass random data to an event's header, can the HDFSSink 
write it to HDFS?  I know it can use the headers to split the data into 
different paths, but what about writing the data to HDFS itself?

thanks for your time reading this email.




이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에 포함된 
정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못 전송된 경우, 
발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
This E-mail may contain confidential information and/or copyright material. 
This email is intended for the use of the addressee only. If you receive this 
email by mistake, please either delete it without reproducing, distributing or 
retaining copies thereof or notify the sender immediately.

Reply via email to