Hi, Jeff,
Thank you for your quick response!I could not easily find the exact log entry
that had the issue - as all I had were 30M input log files :).
After further debugging, I figured out what the issue was . Here is what
happened.
For production, we use Exec sink with 'tail -f '. For my local testing I use a
spooling dir. The issue happened when I was using the spooldir sink, when a log
file had non-UTF-8 characters.
However, the exception that I've posted came not from processing the log file!
The flow was as following:1. Flume is started with spooldir sink2. a log file
with non-utf-8 chars is moved into the spooldir3. Flume starts processing,
encounters a "bad" character and stops (no errors or anything)4. I kill Flume
manually and restart - without cleaning out its .flumespool dir
5. FLume starts up and now chokes up processing its own .flumespool dir and the
left-over file in there! - this is where the MalformedInputException came from
When I processed the same file via Exec sink, and 'tail -n 10000 ..' command -
it was processed successfully - which told me the issue is specific to the
spooled sink.
The solution was to add this parameter to the spooldir
sink:a1.sources.r1.inputCharset = ISO8859-1
Thanks!Marina
From: Jeff Lord <[email protected]>
To: "[email protected]" <[email protected]>; Marina <[email protected]>
Sent: Monday, March 9, 2015 11:17 AM
Subject: Re: MalformedInputException processing logs from Varnish server
Hi Marina,
Do you have a sample of the characters/data which you believe to be causing
this?Can you just confirm you are using apache version of flume or a specific
distro?Also in your message you mention that you are using tail -f which would
be the exec source but the stack trace looks like you are actually using the
spooldir source.
Best,
Jeff
On Mon, Mar 9, 2015 at 10:26 AM, Marina <[email protected]> wrote:
Hi,I have configured Flume to "tail -f" logs from my Varnish server - pretty
much standard Apache HTTP logs.However, sometimes Flume chokes on some special
characters and dies - stops processing new log entries.
See below for a stack trace.
It seems like this exact issue was reported as Flume bug in 1.4.x
version:https://issues.apache.org/jira/browse/FLUME-2052and it was marked as
resolved in 1.5.0 version.The version I am using is Flume 1.5.2 - and I am
still seeing this issue...
Could somebody confirm/deny if what I am seeing is the same issue and should
have been fixed? OR is this completely different?
Thank you!Marina
06 Mar 2015 18:16:57,820 ERROR [pool-3-thread-1]
(org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:256)
- FATAL: Spool Directory source r1: { spoolDir: /data1/varnish-logs-active }:
Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume
to continue processing.
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at
org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:195)
at
org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)
at
org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)
at
org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)
at
org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:238)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)