Hey Alex,
i used the logger command to generate a syslog message and rsyslogd to send it.
I did this to prevent any malformed message. rsyslogd talks RFC 3164 by
default. If I use rsyslogd to receive and store the message all information are
fine, so the message itself should be correct.
Running your command breaks the cdh4.0.1 flume-ng version. And as far as I can
see from your pasted output, it is broken in your version too. The host is
filled with "a", but should be "host".
Also I tried to write into a logger sink, this doesn't break flume, but
explains the problem a bit more. Just writing to an HDFS sink breaks it (if you
use %Y and so on inside the path).
echo "<13>Jun 20 12:12:12 host foo[345]: a syslog message with" > /tmp/foo; nc
-v aHostname 5140 < /tmp/foo
2012-07-11 16:42:58,779 INFO sink.LoggerSink: Event: {
headers:{timestamp=1340187132000, Severity=5, host=host, Facility=8} body: 66
6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }
As you see, everything is fine. Timestamp is set, host is filled correctly and
the HDFS sink would be able to process this message.
echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with" > /tmp/foo;
nc -v aHostname 5140 < /tmp/foo
2012-07-11 16:42:34,006 INFO sink.LoggerSink: Event: { headers:{Severity=5,
host=a, Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77
syslog message w }
This one is broken, host is "a" and no timestamp :)
Best regards,
Chris
-----Ursprüngliche Nachricht-----
Von: alo alt [mailto:[email protected]]
Gesendet: Mittwoch, 11. Juli 2012 14:54
An: [email protected]
Betreff: Re: Problems with time variables in HDFS path
Chris,
syslog is a RFC defined protocol, we support only RFC-5424 and RFC-3164
formats. Since you've to use valid syslog events it works:
echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with -" > /tmp/foo
nc -v YOUR_IP 5140 < /tmp/foo
12/07/11 14:51:52 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a,
Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog
message w }
- Alex
p.s. ich hab Deine Mail an mich nicht gesehen, aber es ist besser an die liste
zu schreiben ;)
On Jul 11, 2012, at 12:15 PM, Juhani Connolly wrote:
> The time variables depend on the existence of a header with the key
> "timestamp". If it isn't there, it tries to parse a non-existent header to
> calculate the time, and this happens. I don't believe it has anything to do
> with the contents of your log message.
>
> For the easiest way to add the header, I would recommend trying 1.2.0
> as soon as it is released(or you can try grabbing the current release
> candidate or even the 1.3.0 trunk which I'm running right now without
> any serious issues), and using the TimestampInterceptor there. As this
> is a frequent query I've made a jira to document this dependency
> properly https://issues.apache.org/jira/browse/FLUME-1364
>
> On 07/11/2012 06:41 PM, Christian Schroer wrote:
>> Hi,
>>
>> we are running into a strange problem using Flume-NG 1.10 from CDH 4.0.1.
>>
>> Setup:
>> Flume-NG opens a TCP syslog port, collects all messages and forwards them
>> directly into HDFS. This works fine until the point where we want to forward
>> MS IIS Logs in W3C format. The reason seems to be a " - " inside the log
>> message. I could reproduce the problem using rsyslogd forwarding all syslog
>> messages to flume:
>>
>> logger "Hello this is a test" => Works fine :)
>>
>> logger "hello - this will break" => breaks flume :(
>>
>> If I remove the time variables from the HDFS path in our configuration
>> (attached) everything is working fine...
>>
>> Exception:
>>
>> 2012-07-11 11:08:18,292 ERROR hdfs.HDFSEventSink: process failed
>> java.lang.NumberFormatException: null
>> at java.lang.Long.parseLong(Long.java:375)
>> at java.lang.Long.valueOf(Long.java:525)
>> at
>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>> at
>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>> at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>> at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> at java.lang.Thread.run(Thread.java:662)
>> 2012-07-11 11:08:18,294 ERROR flume.SinkRunner: Unable to deliver event.
>> Exception follows.
>> org.apache.flume.EventDeliveryException: java.lang.NumberFormatException:
>> null
>> at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:469)
>> at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.NumberFormatException: null
>> at java.lang.Long.parseLong(Long.java:375)
>> at java.lang.Long.valueOf(Long.java:525)
>> at
>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>> at
>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>> at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>> ... 3 more
>>
>> I attached our configuration in case something is broken in there.
>>
>> Best regards,
>>
>> Christian Schroer
>>
>
>
--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF