I'm wondering how much of the time is spent by Logstash reading and
processing the log vs. time spent sending data to Kafka. Also, I'm not
familiar with log.stash internals, perhaps it can be tuned to send the data
to Kafka in larger batches?

At the moment its difficult to tell where is the slowdown. More information
about the breakdown of time will help.

You can try Flume's SpoolingDirectory source with Kafka Channel or Sink and
see if you get improved performance out of other tools.


Gwen

On Sun, Feb 8, 2015 at 12:06 AM, Vineet Mishra <clearmido...@gmail.com>
wrote:

> Hi All,
>
> I am having some log files of around 30GB, I am trying to event process
> these logs by pushing them to Kafka. I could clearly see the throughput
> achieved while publishing these event to Kafka is quiet slow.
>
> So as mentioned for the single log file of 30GB, the Logstash is
> continuously emitting to Kafka and it is running from more than 2 days but
> still it has processed just 60% of the log data. I was looking out for a
> way to increase the efficiency of the publishing the event to kafka as with
> this rate of data ingestion I don't think it will be a good option to move
> ahead.
>
> Looking out for performance improvisation for the same.
>
> Experts advise required!
>
> Thanks!
>

Reply via email to