I too had the same problem (in flume 1.4).
We had checked that the input data is actually utf-8.
When we used input charset as 'unicode' it worked.
By "worked" I mean, it didn't give this exception.
At the destination that data was garbage for us?
Is it a known thing or are we missing anything?
On 08/04/2013 12:26 PM, Anat Rozenzon wrote:
Hi,
I'm trying to read a directory with the spooler and at some point I'm
starting to get these errors:
01 Aug 2013 10:10:17,892 ERROR [pool-6-thread-1]
(org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:173)
- Uncaught exception in Runnable
java.nio.charset.MalformedInputException: Input length = 1
at
java.nio.charset.CoderResult.throwException(CoderResult.java:277)
at
org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:169)
at
org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)
at
org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)
at
org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)
at
org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:221)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:160)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
I can see that this is a character set issue, however, the files are
suppose to be UTF-8 files.
Still some characters are invalid, is there any way to ignore these lines?
Also, is there a way to know which file/line is causing the exception?
Thanks
Anat