2013/6/14 David Lang <[email protected]>

> On Fri, 14 Jun 2013, Radu Gheorghe wrote:
>
>  Hi Mahesh,
>>
>> If you don't need mysql for a specific reason, I'd suggest you try thowing
>> your logs in Elasticsearch. Here's a tutorial:
>> http://wiki.rsyslog.com/index.**php/HOWTO:_rsyslog_%2B_**elasticsearch<http://wiki.rsyslog.com/index.php/HOWTO:_rsyslog_%2B_elasticsearch>
>>
>> I assume you'll get way better insert and query performance than you can
>> with mysql (ie: with bulks, I get 10-20K logs indexed per second on my
>> $500
>> laptop. Then I can query in 100M-200M logs within a second. Depends on
>> your
>> settings). Plus, it's super-easy to scale Elasticsearch by adding new
>> nodes.
>>
>> For querying, there are several, tools, the most popular being Kibana:
>> http://three.kibana.org/
>>
>
> Just to note, one of the things that makes MySQL so slow or Mahesh is it's
> safety features. After each insert, MySQL makes sure the data is safe on
> disk before it considers the insert complete.


By that, you mean it does a fsync after every transaction? I thought it
doesn't do this (at least not by default, with neither MyISAM nor InnoDB).
But then again, at least InnoDB does it more often than ES does.


> If the system crashes, the data will be there. There are config options to
> override this in MySQL.
>
> To get the numbers that elasticsearch is getting on your laptop, it's
> almost certinly not doing this.
>

I assume you lose some data if the whole system suddenly goes down. But if
just ES does (ie: kill -9 the JVM), you shouldn't lose any data.

I think ES writes stuff in a very different way than MySQL does. When you
index something in ES, it does the indexing in memory and writes the raw
data in the transaction
log<http://www.elasticsearch.org/guide/reference/index-modules/translog/>.
Only after this is done you get a reply from ES.

The transaction log is replayed on startup in case something goes wrong and
you lose the data you had in memory. Every once in a while, it writes what
it has to disk in the actual Lucene
index<http://www.elasticsearch.org/guide/reference/glossary/#shard>where
it stores data "permanently".

These chunks of data that it writes are
segments<https://lucene.apache.org/core/3_6_2/fileformats.html#Segments>,
which consist of multiple files. The thing about segments is that they're
immutable. And to make sure that you don't end up with a gazzillion
segments, these are asynchronously
merged<http://www.elasticsearch.org/guide/reference/index-modules/merge/>from
time to time.


>
> this is probably acceptable, but you do need to be aware of the tradeoff.
>

Right, there are always trade-offs. I'm sorry if I came across as the
"you're using the wrong technology" guy. I hate it when people do that.

In this particular case, I understand it's only about aggregating logs and
searching them afterwards instead of doing that with straight files. And
this is exactly what ES is about, so I thought it would be easier/better to
give it a shot. And I don't see write speed as being its strong point,
either - that would be the search speed.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to