Here you are: http://flume.apache.org/FlumeDeveloperGuide.html#client
Hari -- Hari Shreedharan On Wednesday, February 6, 2013 at 10:20 AM, Alan Miller wrote: > Thanks Hari, > > Are there any links to examples of how to use the RpcClient? > > Alan > > From: Hari Shreedharan [mailto:[email protected]] > Sent: Wednesday, February 06, 2013 7:16 PM > To: [email protected] (mailto:[email protected]) > Subject: Re: streaming Avro to HDFS > > Alan, > > > > I think this is probably because the AvroClient is not really very "smart." > It is mainly useful for testing the AvroSource. The AvroClient reads the file > passed in, and sends one line per event (in 1.2.0, in 1.3.0+ there is an > option of sending all files in a directory). So the events are not really > sent as Avro files, and since you are using the text serializer they are > dumped as is. Since events can arrive out of order, your data is likely to be > invalid Avro. Also the new line character that is used to split the event may > actually have been part of the real avro serialization, removing it simply > made it invalid avro. > > > > My advice would be to use the RpcClient to read the file, and send the data > such that you send the data in a valid format, by making sure one avro > "container" is in one event. > > > > > > Hari > > > > -- > > Hari Shreedharan > > > > > On Wednesday, February 6, 2013 at 9:58 AM, Alan Miller wrote: > > > > Hi I’m just getting started with Flume and trying to understand the flow of > > things. > > > > > > > > > > > > I have avro binary data files being generated on remote nodes and I want to > > use > > > > > > Flume (1.2.0) to stream them to my HDFS cluster at a central location. It > > seems I can > > > > > > stream the data but the resulting files on HDFS seem corrupt. Here’s what > > I did: > > > > > > > > > > > > For my “master” (on the NameNode of my Hadoop cluster) I started this: > > > > > > flume-ng agent -f agent.conf -Dflume.root.logger=DEBUG,console -n agent > > > > > > With this config: > > > > > > agent.channels = memory-channel > > > > > > agent.sources = avro-source > > > > > > agent.sinks = hdfs-sink > > > > > > > > > > > > agent.channels.memory-channel.type = memory > > > > > > agent.channels.memory-channel.capacity = 1000 > > > > > > agent.channels.memory-channel.transactionCapacity = 100 > > > > > > > > > > > > agent.sources.avro-source.channels = memory-channel > > > > > > agent.sources.avro-source.type = avro > > > > > > agent.sources.avro-source.bind = 10.10.10.10 > > > > > > agent.sources.avro-source.port = 41414 > > > > > > > > > > > > agent.sinks.hdfs-sink.type = hdfs > > > > > > agent.sinks.hdfs-sink.channel = memory-channel > > > > > > agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode1:9000/flume > > > > > > > > > > > > On a remote node I streamed a test file like this: > > > > > > flume-ng avro-client -H 10.10.10.10 -p 41414 -F /tmp/test.avro > > > > > > > > > > > > I can see the master is writing to HDFS > > > > > > …… > > > > > > 13/02/06 09:37:55 INFO hdfs.BucketWriter: Creating > > hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp > > > > > > 13/02/06 09:38:25 INFO hdfs.BucketWriter: Renaming > > hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp > > > > > > to hdfs://namenode1:9000/flume/FlumeData.1360172273684 > > > > > > > > > > > > But the data doesn’t seem right. The original file is 4551 bytes, the file > > written to > > > > > > HDFS was only 219 bytes > > > > > > [localhost] $ ls –l FlumeData.1360172273684 /tmp/test.avro > > > > > > -rwxr-xr-x 1 amiller amiller 219 Feb 6 18:51 FlumeData.1360172273684 > > > > > > -rwxr-xr-x 1 amiller amiller 4551 Feb 6 12:00 /tmp/test.avro > > > > > > > > > > > > [localhost] $ avro cat /tmp/test.avro > > > > > > {"system_model": null, "nfsv4": null, "ip": null, "site": null, "nfsv3": > > null, "export": null, "ifnet": [{"send_bps": 1234, "recv_bps": 5678, > > "name": "eth0"}, {"send_bps": 100, "recv_bps": 200, "name": "eth1"}, > > {"send_bps": 0, "recv_bps": 0, "name": "eth2"}], "disk": null, "hostname": > > "localhost", "total_mem": null, "ontapi_version": null, "serial_number": > > null, "cifs": null, "cpu_model": null, "volume": null, "time_stamp": > > 1357639723, "aggregate": null, "num_cpu": null, "cpu_speed_mhz": null, > > "hostid": null, "kernel_version": null, "qtree": null, "processor": null} > > > > > > > > > > > > [localhost] $ hadoop fs -copyToLocal /flume/FlumeData.1360172273684 . > > > > > > [localhost] $ avro cat FlumeData.1360172273684 > > > > > > panic: ord() expected a character, but string of length 0 found > > > > > > > > > > > > Alan > > > > > > > > > > > > > > > > > > > > > > > > > >
