2013/6/14 David Lang <[email protected]> > On Fri, 14 Jun 2013, Radu Gheorghe wrote: > > Hi Mahesh, >> >> If you don't need mysql for a specific reason, I'd suggest you try thowing >> your logs in Elasticsearch. Here's a tutorial: >> http://wiki.rsyslog.com/index.**php/HOWTO:_rsyslog_%2B_**elasticsearch<http://wiki.rsyslog.com/index.php/HOWTO:_rsyslog_%2B_elasticsearch> >> >> I assume you'll get way better insert and query performance than you can >> with mysql (ie: with bulks, I get 10-20K logs indexed per second on my >> $500 >> laptop. Then I can query in 100M-200M logs within a second. Depends on >> your >> settings). Plus, it's super-easy to scale Elasticsearch by adding new >> nodes. >> >> For querying, there are several, tools, the most popular being Kibana: >> http://three.kibana.org/ >> > > Just to note, one of the things that makes MySQL so slow or Mahesh is it's > safety features. After each insert, MySQL makes sure the data is safe on > disk before it considers the insert complete.
By that, you mean it does a fsync after every transaction? I thought it doesn't do this (at least not by default, with neither MyISAM nor InnoDB). But then again, at least InnoDB does it more often than ES does. > If the system crashes, the data will be there. There are config options to > override this in MySQL. > > To get the numbers that elasticsearch is getting on your laptop, it's > almost certinly not doing this. > I assume you lose some data if the whole system suddenly goes down. But if just ES does (ie: kill -9 the JVM), you shouldn't lose any data. I think ES writes stuff in a very different way than MySQL does. When you index something in ES, it does the indexing in memory and writes the raw data in the transaction log<http://www.elasticsearch.org/guide/reference/index-modules/translog/>. Only after this is done you get a reply from ES. The transaction log is replayed on startup in case something goes wrong and you lose the data you had in memory. Every once in a while, it writes what it has to disk in the actual Lucene index<http://www.elasticsearch.org/guide/reference/glossary/#shard>where it stores data "permanently". These chunks of data that it writes are segments<https://lucene.apache.org/core/3_6_2/fileformats.html#Segments>, which consist of multiple files. The thing about segments is that they're immutable. And to make sure that you don't end up with a gazzillion segments, these are asynchronously merged<http://www.elasticsearch.org/guide/reference/index-modules/merge/>from time to time. > > this is probably acceptable, but you do need to be aware of the tradeoff. > Right, there are always trade-offs. I'm sorry if I came across as the "you're using the wrong technology" guy. I hate it when people do that. In this particular case, I understand it's only about aggregating logs and searching them afterwards instead of doing that with straight files. And this is exactly what ES is about, so I thought it would be easier/better to give it a shot. And I don't see write speed as being its strong point, either - that would be the search speed. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

