Hey Chris,

ah, you're right. 
https://issues.apache.org/jira/browse/FLUME-1365

cheers,
 Alex 

On Jul 11, 2012, at 5:31 PM, Christian Schroer wrote:

> Hey Alex,
> 
> i used the logger command to generate a syslog message and rsyslogd to send 
> it. I did this to prevent any malformed message. rsyslogd talks RFC 3164 by 
> default. If I use rsyslogd to receive and store the message all information 
> are fine, so the message itself should be correct.
> 
> Running your command breaks the cdh4.0.1 flume-ng version. And as far as I 
> can see from your pasted output, it is broken in your version too. The host 
> is filled with "a", but should be "host".
> 
> Also I tried to write into a logger sink, this doesn't break flume, but 
> explains the problem a bit more. Just writing to an HDFS sink breaks it (if 
> you use %Y and so on inside the path).
> 
> echo "<13>Jun 20 12:12:12 host foo[345]: a syslog message with" > /tmp/foo; 
> nc -v aHostname 5140 < /tmp/foo
> 2012-07-11 16:42:58,779 INFO sink.LoggerSink: Event: { 
> headers:{timestamp=1340187132000, Severity=5, host=host, Facility=8} body: 66 
> 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }
> 
> As you see, everything is fine. Timestamp is set, host is filled correctly 
> and the HDFS sink would be able to process this message.
> 
> echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with" > /tmp/foo; 
> nc -v aHostname 5140 < /tmp/foo
> 2012-07-11 16:42:34,006 INFO sink.LoggerSink: Event: { headers:{Severity=5, 
> host=a, Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 
> syslog message w }
> 
> This one is broken, host is "a" and no timestamp :)
> 
> Best regards,
> Chris
> 
> -----Ursprüngliche Nachricht-----
> Von: alo alt [mailto:[email protected]] 
> Gesendet: Mittwoch, 11. Juli 2012 14:54
> An: [email protected]
> Betreff: Re: Problems with time variables in HDFS path
> 
> Chris,
> 
> syslog is a RFC defined protocol, we support only RFC-5424 and RFC-3164 
> formats. Since you've to use valid syslog events it works:
> 
> echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with -" > 
> /tmp/foo nc -v YOUR_IP 5140 < /tmp/foo
> 
> 12/07/11 14:51:52 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a, 
> Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog 
> message w }
> 
> 
> - Alex
> 
> p.s. ich hab Deine Mail an mich nicht gesehen, aber es ist besser an die 
> liste zu schreiben ;)
> 
> 
> 
> On Jul 11, 2012, at 12:15 PM, Juhani Connolly wrote:
> 
>> The time variables depend on the existence of a header with the key 
>> "timestamp". If it isn't there, it tries to parse a non-existent header to 
>> calculate the time, and this happens. I don't believe it has anything to do 
>> with the contents of your log message.
>> 
>> For the easiest way to add the header, I would recommend trying 1.2.0 
>> as soon as it is released(or you can try grabbing the current release 
>> candidate or even the 1.3.0 trunk which I'm running right now without 
>> any serious issues), and using the TimestampInterceptor there. As this 
>> is a frequent query I've made a jira to document this dependency 
>> properly https://issues.apache.org/jira/browse/FLUME-1364
>> 
>> On 07/11/2012 06:41 PM, Christian Schroer wrote:
>>> Hi,
>>> 
>>> we are running into a strange problem using Flume-NG 1.10 from CDH 4.0.1.
>>> 
>>> Setup:
>>> Flume-NG opens a TCP syslog port, collects all messages and forwards them 
>>> directly into HDFS. This works fine until the point where we want to 
>>> forward MS IIS Logs in W3C format. The reason seems to be a " - " inside 
>>> the log message. I could reproduce the problem using rsyslogd forwarding 
>>> all syslog messages to flume:
>>> 
>>> logger "Hello this is a test" => Works fine :)
>>> 
>>> logger "hello - this will break" => breaks flume :(
>>> 
>>> If I remove the time variables from the HDFS path in our configuration 
>>> (attached) everything is working fine...
>>> 
>>> Exception:
>>> 
>>> 2012-07-11 11:08:18,292 ERROR hdfs.HDFSEventSink: process failed
>>> java.lang.NumberFormatException: null
>>>        at java.lang.Long.parseLong(Long.java:375)
>>>        at java.lang.Long.valueOf(Long.java:525)
>>>        at 
>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>>        at 
>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>>        at 
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>>        at 
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>        at java.lang.Thread.run(Thread.java:662)
>>> 2012-07-11 11:08:18,294 ERROR flume.SinkRunner: Unable to deliver event. 
>>> Exception follows.
>>> org.apache.flume.EventDeliveryException: java.lang.NumberFormatException: 
>>> null
>>>        at 
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:469)
>>>        at 
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>        at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.NumberFormatException: null
>>>        at java.lang.Long.parseLong(Long.java:375)
>>>        at java.lang.Long.valueOf(Long.java:525)
>>>        at 
>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>>        at 
>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>>        at 
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>>        ... 3 more
>>> 
>>> I attached our configuration in case something is broken in there.
>>> 
>>> Best regards,
>>> 
>>> Christian Schroer
>>> 
>> 
>> 
> 
> 
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> 


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Reply via email to