Hi I'm just getting started with Flume and trying to understand the flow of
things.
I have avro binary data files being generated on remote nodes and I want to use
Flume (1.2.0) to stream them to my HDFS cluster at a central location. It seems
I can
stream the data but the resulting files on HDFS seem corrupt. Here's what I
did:
For my "master" (on the NameNode of my Hadoop cluster) I started this:
flume-ng agent -f agent.conf -Dflume.root.logger=DEBUG,console -n agent
With this config:
agent.channels = memory-channel
agent.sources = avro-source
agent.sinks = hdfs-sink
agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 1000
agent.channels.memory-channel.transactionCapacity = 100
agent.sources.avro-source.channels = memory-channel
agent.sources.avro-source.type = avro
agent.sources.avro-source.bind = 10.10.10.10
agent.sources.avro-source.port = 41414
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode1:9000/flume
On a remote node I streamed a test file like this:
flume-ng avro-client -H 10.10.10.10 -p 41414 -F /tmp/test.avro
I can see the master is writing to HDFS
......
13/02/06 09:37:55 INFO hdfs.BucketWriter: Creating
hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
13/02/06 09:38:25 INFO hdfs.BucketWriter: Renaming
hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
to hdfs://namenode1:9000/flume/FlumeData.1360172273684
But the data doesn't seem right. The original file is 4551 bytes, the file
written to
HDFS was only 219 bytes
[localhost] $ ls -l FlumeData.1360172273684 /tmp/test.avro
-rwxr-xr-x 1 amiller amiller 219 Feb 6 18:51 FlumeData.1360172273684
-rwxr-xr-x 1 amiller amiller 4551 Feb 6 12:00 /tmp/test.avro
[localhost] $ avro cat /tmp/test.avro
{"system_model": null, "nfsv4": null, "ip": null, "site": null, "nfsv3":
null, "export": null, "ifnet": [{"send_bps": 1234, "recv_bps": 5678, "name":
"eth0"}, {"send_bps": 100, "recv_bps": 200, "name": "eth1"}, {"send_bps": 0,
"recv_bps": 0, "name": "eth2"}], "disk": null, "hostname": "localhost",
"total_mem": null, "ontapi_version": null, "serial_number": null, "cifs": null,
"cpu_model": null, "volume": null, "time_stamp": 1357639723, "aggregate": null,
"num_cpu": null, "cpu_speed_mhz": null, "hostid": null, "kernel_version": null,
"qtree": null, "processor": null}
[localhost] $ hadoop fs -copyToLocal /flume/FlumeData.1360172273684 .
[localhost] $ avro cat FlumeData.1360172273684
panic: ord() expected a character, but string of length 0 found
Alan