Hey Alex,

i used the logger command to generate a syslog message and rsyslogd to send it. 
I did this to prevent any malformed message. rsyslogd talks RFC 3164 by 
default. If I use rsyslogd to receive and store the message all information are 
fine, so the message itself should be correct.

Running your command breaks the cdh4.0.1 flume-ng version. And as far as I can 
see from your pasted output, it is broken in your version too. The host is 
filled with "a", but should be "host".

Also I tried to write into a logger sink, this doesn't break flume, but 
explains the problem a bit more. Just writing to an HDFS sink breaks it (if you 
use %Y and so on inside the path).

echo "<13>Jun 20 12:12:12 host foo[345]: a syslog message with" > /tmp/foo; nc 
-v aHostname 5140 < /tmp/foo
2012-07-11 16:42:58,779 INFO sink.LoggerSink: Event: { 
headers:{timestamp=1340187132000, Severity=5, host=host, Facility=8} body: 66 
6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }

As you see, everything is fine. Timestamp is set, host is filled correctly and 
the HDFS sink would be able to process this message.

echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with" > /tmp/foo; 
nc -v aHostname 5140 < /tmp/foo
2012-07-11 16:42:34,006 INFO sink.LoggerSink: Event: { headers:{Severity=5, 
host=a, Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 
syslog message w }

This one is broken, host is "a" and no timestamp :)

Best regards,
Chris

-----Ursprüngliche Nachricht-----
Von: alo alt [mailto:[email protected]] 
Gesendet: Mittwoch, 11. Juli 2012 14:54
An: [email protected]
Betreff: Re: Problems with time variables in HDFS path

Chris,

syslog is a RFC defined protocol, we support only RFC-5424 and RFC-3164 
formats. Since you've to use valid syslog events it works:

echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with -" > /tmp/foo 
nc -v YOUR_IP 5140 < /tmp/foo

12/07/11 14:51:52 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a, 
Facility=8} body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog 
message w }


- Alex

p.s. ich hab Deine Mail an mich nicht gesehen, aber es ist besser an die liste 
zu schreiben ;)



On Jul 11, 2012, at 12:15 PM, Juhani Connolly wrote:

> The time variables depend on the existence of a header with the key 
> "timestamp". If it isn't there, it tries to parse a non-existent header to 
> calculate the time, and this happens. I don't believe it has anything to do 
> with the contents of your log message.
> 
> For the easiest way to add the header, I would recommend trying 1.2.0 
> as soon as it is released(or you can try grabbing the current release 
> candidate or even the 1.3.0 trunk which I'm running right now without 
> any serious issues), and using the TimestampInterceptor there. As this 
> is a frequent query I've made a jira to document this dependency 
> properly https://issues.apache.org/jira/browse/FLUME-1364
> 
> On 07/11/2012 06:41 PM, Christian Schroer wrote:
>> Hi,
>> 
>> we are running into a strange problem using Flume-NG 1.10 from CDH 4.0.1.
>> 
>> Setup:
>> Flume-NG opens a TCP syslog port, collects all messages and forwards them 
>> directly into HDFS. This works fine until the point where we want to forward 
>> MS IIS Logs in W3C format. The reason seems to be a " - " inside the log 
>> message. I could reproduce the problem using rsyslogd forwarding all syslog 
>> messages to flume:
>> 
>> logger "Hello this is a test" => Works fine :)
>> 
>> logger "hello - this will break" => breaks flume :(
>> 
>> If I remove the time variables from the HDFS path in our configuration 
>> (attached) everything is working fine...
>> 
>> Exception:
>> 
>> 2012-07-11 11:08:18,292 ERROR hdfs.HDFSEventSink: process failed
>> java.lang.NumberFormatException: null
>>         at java.lang.Long.parseLong(Long.java:375)
>>         at java.lang.Long.valueOf(Long.java:525)
>>         at 
>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>         at 
>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>         at 
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>         at 
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2012-07-11 11:08:18,294 ERROR flume.SinkRunner: Unable to deliver event. 
>> Exception follows.
>> org.apache.flume.EventDeliveryException: java.lang.NumberFormatException: 
>> null
>>         at 
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:469)
>>         at 
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.NumberFormatException: null
>>         at java.lang.Long.parseLong(Long.java:375)
>>         at java.lang.Long.valueOf(Long.java:525)
>>         at 
>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>         at 
>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>         at 
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>         ... 3 more
>> 
>> I attached our configuration in case something is broken in there.
>> 
>> Best regards,
>> 
>> Christian Schroer
>> 
> 
> 


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Reply via email to