Huge THANKS Hari.

I just did this per your recommendation/docs – and it worked !!!, I can now see 
the body data in HDFS file, Yay!!!

curl -H "Accept: application/json" -H "Content-type: application/json" -X POST 
-d  ['{"headers" : {"a":"b", "c":"d"},"body": "jonathan_sutaun_body"}'] 
http://localhost:8889


Question:

Is it possible to get the headers contents/value as well in the HDFS file 
including body contents? For that, do we need to write our own custom 
interceptor ?

Our developer says, he is passing json data via a python script in dict (key, 
value) pairs and we need both header and body contents/data in HDFS.

Please advise!

From: Hari Shreedharan [mailto:[email protected]]
Sent: Friday, September 04, 2015 12:43 PM
To: [email protected]
Subject: Re: Need Urgent Help (please) with HTTP Source/JSON Handling

The JSONHandler requires the data to be in a specific format: 
https://flume.apache.org/releases/content/1.5.0/apidocs/org/apache/flume/source/http/JSONHandler.html


Thanks,
Hari

On Fri, Sep 4, 2015 at 10:38 AM, Sutanu Das 
<[email protected]<mailto:[email protected]>> wrote:
Dear Community,

We are trying to send http/json messages and no errors in Flume get all files 
in HDFS is NULL (no data seen), we are passing events as JSON Strings, yet, 
when we see files in HDFS, we see not data

Is there a HDFS “sink” parameter to show JSON data in hdfs?

We are testing this a simple command as this -- curl -H "Accept: 
application/json" -H "Content-type: application/json" -X POST -d ['{"id":100}'] 
http://localhost:8889

We are passing json as string yet hdfs data is created yet no data seen inside 
hdfs file.


Please HELP, please!




Here is our config:

ale.sources = source1
ale.channels = channel1
ale.sinks =  sink1

# Define the source
ale.sources.source1.type = http
#ale.sources.source1.handler = org.apache.flume.source.http.JSONHandler
ale.sources.source1.port = 8889
ale.sources.source1.bind = 0.0.0.0

# Define the channel 1
ale.channels.channel1.type = memory
ale.channels.channel1.capacity = 10000000
ale.channels.channel1.transactionCapacity = 10000000

# Define a logging sink
ale.sinks.sink1.type = hdfs
ale.sinks.sink1.channel = channel1
ale.sinks.sink1.hdfs.path = 
hdfs://ham-dal-d001.corp.wayport.net:8020/prod/hadoop/smallsite/flume_ingest_ale2_hak_dev/station/%Y/%m/%d/%H<http://ham-dal-d001.corp.wayport.net:8020/prod/hadoop/smallsite/flume_ingest_ale2_hak_dev/station/%25Y/%25m/%25d/%25H>
#ale.sinks.sink1.hdfs.fileType = DataStream
#ale.sinks.sink1.hdfs.writeFormat = Text
ale.sinks.sink1.hdfs.filePrefix = Ale_2_topology_http_json_raw
ale.sinks.sink1.hdfs.useLocalTimeStamp = true
ale.sinks.sink1.hdfs.round = true
ale.sinks.sink1.hdfs.roundValue = 1
ale.sinks.sink1.hdfs.roundUnit = hour
ale.sinks.sink1.hdfs.callTimeout = 10000000000

Reply via email to