anyone knows why this is happening? -eran
On Tue, Sep 11, 2012 at 2:26 AM, Eran Kutner <[email protected]> wrote: > Hi, > I'm trying to compress avro files written with hdfs sink everything > appears to work but the files themselves are mostly empty. It appears that > instead of writing the actual data only some kind of a header is written > for every data row in the file. This is a hex dump of such a file: > 0000000 0000 6100 0000 0900 0061 fe0a 0001 017e > 0000010 0000 0000 0064 0000 6409 0a00 01fe 8a00 > 0000020 0001 0000 6400 0000 0900 0064 fe0a 0001 > 0000030 018a 0000 0000 0064 0000 6409 0a00 01fe > 0000040 8a00 0001 0000 6400 0000 0900 0064 fe0a > 0000050 0001 018a 0000 0000 0064 0000 6409 0a00 > 0000060 01fe 8a00 0001 0000 6400 0000 0900 0064 > 0000070 fe0a 0001 018a 0000 0000 0064 0000 6409 > 0000080 0a00 01fe 8a00 0001 0000 6400 0000 0900 > 0000090 0064 fe0a 0001 018a 0000 0000 0064 0000 > 00000a0 6409 0a00 01fe 8a00 0001 0000 6400 0000 > 00000b0 0900 0064 fe0a 0001 018a 0000 0000 0064 > 00000c0 0000 6409 0a00 01fe 8a00 0001 0000 6400 > 00000d0 0000 0900 0064 fe0a 0001 018a 0000 0000 > 00000e0 0064 0000 6409 0a00 01fe 8a00 0001 0000 > 00000f0 6400 0000 0900 0064 fe0a 0001 018a 0000 > 0000100 0000 0064 0000 6409 0a00 01fe 8a00 0001 > 0000110 0000 6400 0000 0900 0064 fe0a 0001 018a > 0000120 0000 0000 0064 0000 6409 0a00 01fe 8a00 > 0000130 0001 0000 6400 0000 0900 0064 fe0a 0001 > 0000140 018a 0000 0000 0064 0000 6409 0a00 01fe > 0000150 8a00 0001 0000 6400 0000 0900 0064 fe0a > 0000160 0001 018a 0000 0000 0064 0000 6409 0a00 > 0000170 01fe 8a00 0001 0000 6400 0000 0900 0064 > 0000180 fe0a 0001 018a 0000 0000 0064 0000 6409 > 0000190 0a00 01fe 8a00 0001 0000 6400 0000 0900 > 00001a0 0064 fe0a 0001 018a 0000 0000 0064 0000 > 00001b0 6409 0a00 01fe 8a00 0001 0000 6400 0000 > 00001c0 0900 0064 fe0a 0001 018a 0000 0000 0064 > 00001d0 0000 6409 0a00 01fe 8a00 0001 0000 6400 > > Notice the repeating pattern within the data, it looks like empty headers > with no data. > > This is my sink config: > agent.sinks.hdfsSink2.type = hdfs > agent.sinks.hdfsSink2.channel = memoryChannel2 > agent.sinks.hdfsSink2.hdfs.path=hdfs://hadoop2-m1:8020/raw-events/%Y-%m-%d > agent.sinks.hdfsSink2.hdfs.filePrefix=load-events.%{hostname}.avro > agent.sinks.hdfsSink2.hdfs.rollInterval=60 > agent.sinks.hdfsSink2.hdfs.rollCount=0 > agent.sinks.hdfsSink2.hdfs.rollSize=0 > agent.sinks.hdfsSink2.hdfs.fileType=CompressedStream > agent.sinks.hdfsSink2.hdfs.codeC=snappy > agent.sinks.hdfsSink2.hdfs.writeFormat=Text > agent.sinks.hdfsSink2.hdfs.batchSize=1000 > agent.sinks.hdfsSink2.serializer = avro_event > > > Any help would be appreciated. > > Thanks. > > -eran > >
