Hey Chris,
ah, you're right.
https://issues.apache.org/jira/browse/FLUME-1365
cheers,
Alex
On Jul 11, 2012, at 5:31 PM, Christian Schroer wrote:
> Hey Alex,
>
> i used the logger command to generate a syslog message and rsyslogd to send
> it. I did this to prevent any malformed message. rsyslogd talks RFC 3164 by
> default. If I use rsyslogd to receive and store the message all information
> are fine, so the message itself should be correct.
>
> Running your command breaks the cdh4.0.1 flume-ng version. And as far as I
> can see from your pasted output, it is broken in your version too. The host
> is filled with "a", but should be "host".
>
> Also I tried to write into a logger sink, this doesn't break flume, but
> explains the problem a bit more. Just writing to an HDFS sink breaks it (if
> you use %Y and so on inside the path).
>
> echo "<13>Jun 20 12:12:12 host foo[345]: a syslog message with" > /tmp/foo;
> nc -v aHostname 5140 < /tmp/foo
> 2012-07-11 16:42:58,779 INFO sink.LoggerSink: Event: {
> headers:{timestamp=1340187132000, Severity=5, host=host, Facility=8} body: 66
> 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }
>
> As you see, everything is fine. Timestamp is set, host is filled correctly
> and the HDFS sink would be able to process this message.
>
> echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with" > /tmp/foo;
> nc -v aHostname 5140 < /tmp/foo
> 2012-07-11 16:42:34,006 INFO sink.LoggerSink: Event: { headers:{Severity=5,
> host=a, Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77
> syslog message w }
>
> This one is broken, host is "a" and no timestamp :)
>
> Best regards,
> Chris
>
> -----Ursprüngliche Nachricht-----
> Von: alo alt [mailto:[email protected]]
> Gesendet: Mittwoch, 11. Juli 2012 14:54
> An: [email protected]
> Betreff: Re: Problems with time variables in HDFS path
>
> Chris,
>
> syslog is a RFC defined protocol, we support only RFC-5424 and RFC-3164
> formats. Since you've to use valid syslog events it works:
>
> echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with -" >
> /tmp/foo nc -v YOUR_IP 5140 < /tmp/foo
>
> 12/07/11 14:51:52 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a,
> Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog
> message w }
>
>
> - Alex
>
> p.s. ich hab Deine Mail an mich nicht gesehen, aber es ist besser an die
> liste zu schreiben ;)
>
>
>
> On Jul 11, 2012, at 12:15 PM, Juhani Connolly wrote:
>
>> The time variables depend on the existence of a header with the key
>> "timestamp". If it isn't there, it tries to parse a non-existent header to
>> calculate the time, and this happens. I don't believe it has anything to do
>> with the contents of your log message.
>>
>> For the easiest way to add the header, I would recommend trying 1.2.0
>> as soon as it is released(or you can try grabbing the current release
>> candidate or even the 1.3.0 trunk which I'm running right now without
>> any serious issues), and using the TimestampInterceptor there. As this
>> is a frequent query I've made a jira to document this dependency
>> properly https://issues.apache.org/jira/browse/FLUME-1364
>>
>> On 07/11/2012 06:41 PM, Christian Schroer wrote:
>>> Hi,
>>>
>>> we are running into a strange problem using Flume-NG 1.10 from CDH 4.0.1.
>>>
>>> Setup:
>>> Flume-NG opens a TCP syslog port, collects all messages and forwards them
>>> directly into HDFS. This works fine until the point where we want to
>>> forward MS IIS Logs in W3C format. The reason seems to be a " - " inside
>>> the log message. I could reproduce the problem using rsyslogd forwarding
>>> all syslog messages to flume:
>>>
>>> logger "Hello this is a test" => Works fine :)
>>>
>>> logger "hello - this will break" => breaks flume :(
>>>
>>> If I remove the time variables from the HDFS path in our configuration
>>> (attached) everything is working fine...
>>>
>>> Exception:
>>>
>>> 2012-07-11 11:08:18,292 ERROR hdfs.HDFSEventSink: process failed
>>> java.lang.NumberFormatException: null
>>> at java.lang.Long.parseLong(Long.java:375)
>>> at java.lang.Long.valueOf(Long.java:525)
>>> at
>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>> at
>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>> at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>> at
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>> at java.lang.Thread.run(Thread.java:662)
>>> 2012-07-11 11:08:18,294 ERROR flume.SinkRunner: Unable to deliver event.
>>> Exception follows.
>>> org.apache.flume.EventDeliveryException: java.lang.NumberFormatException:
>>> null
>>> at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:469)
>>> at
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>> at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.NumberFormatException: null
>>> at java.lang.Long.parseLong(Long.java:375)
>>> at java.lang.Long.valueOf(Long.java:525)
>>> at
>>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>> at
>>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>> at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>> ... 3 more
>>>
>>> I attached our configuration in case something is broken in there.
>>>
>>> Best regards,
>>>
>>> Christian Schroer
>>>
>>
>>
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF