On Sun, 16 Jun 2013, Radu Gheorghe wrote:
2013/6/14 David Lang <[email protected]>
On Fri, 14 Jun 2013, Radu Gheorghe wrote:
Hi Mahesh,
If you don't need mysql for a specific reason, I'd suggest you try
thowing
your logs in Elasticsearch. Here's a tutorial:
http://wiki.rsyslog.com/index.****php/HOWTO:_rsyslog_%2B_****
elasticsearch<http://wiki.rsyslog.com/index.**php/HOWTO:_rsyslog_%2B_**elasticsearch>
<http://wiki.**rsyslog.com/index.php/HOWTO:_**rsyslog_%2B_elasticsearch<http://wiki.rsyslog.com/index.php/HOWTO:_rsyslog_%2B_elasticsearch>
I assume you'll get way better insert and query performance than you can
with mysql (ie: with bulks, I get 10-20K logs indexed per second on my
$500
laptop. Then I can query in 100M-200M logs within a second. Depends on
your
settings). Plus, it's super-easy to scale Elasticsearch by adding new
nodes.
For querying, there are several, tools, the most popular being Kibana:
http://three.kibana.org/
Just to note, one of the things that makes MySQL so slow or Mahesh is
it's
safety features. After each insert, MySQL makes sure the data is safe on
disk before it considers the insert complete.
By that, you mean it does a fsync after every transaction? I thought it
doesn't do this (at least not by default, with neither MyISAM nor InnoDB).
But then again, at least InnoDB does it more often than ES does.
I don't remember the table types, but the newer of the two does do fsync
after each transaction, which is how it actually properly supports
transactions. This is why it was such a big deal when MySQL changed the
default.
If the system crashes, the data will be there. There are config options
to
override this in MySQL.
To get the numbers that elasticsearch is getting on your laptop, it's
almost certinly not doing this.
I assume you lose some data if the whole system suddenly goes down. But if
just ES does (ie: kill -9 the JVM), you shouldn't lose any data.
I think ES writes stuff in a very different way than MySQL does. When you
index something in ES, it does the indexing in memory and writes the raw
data in the transaction
log<http://www.elasticsearch.**org/guide/reference/index-**
modules/translog/<http://www.elasticsearch.org/guide/reference/index-modules/translog/>
.
Only after this is done you get a reply from ES.
The transaction log is replayed on startup in case something goes wrong
and
you lose the data you had in memory. Every once in a while, it writes what
it has to disk in the actual Lucene
index<http://www.**elasticsearch.org/guide/**reference/glossary/#shard<http://www.elasticsearch.org/guide/reference/glossary/#shard>
**where
it stores data "permanently".
These chunks of data that it writes are
segments<https://lucene.**apache.org/core/3_6_2/**
fileformats.html#Segments<https://lucene.apache.org/core/3_6_2/fileformats.html#Segments>
,
which consist of multiple files. The thing about segments is that they're
immutable. And to make sure that you don't end up with a gazzillion
segments, these are asynchronously
merged<http://www.**elasticsearch.org/guide/**
reference/index-modules/merge/<http://www.elasticsearch.org/guide/reference/index-modules/merge/>
**>from
time to time.
the thing is that if it doesn't do a fsync, you have no guarantee that the
data is on the disk. And it's very possible for later data to make it to
the disk before earlier data does.
doing a kill -9 isn't the same as a system crash.
when you do a kill -9 the kernel and filesystem code contain all the data
that the application wrote, and will present that data if asked, and will
eventually get it to disk.
But if the system looses power, any data not actually written to disk is
lost. And (depending on lots of implementation details) it's possible to
end up with holes in files, or files created that have no content, or even
files created, with space allocated for them, but stray data from the drive
in that space, not what the application wrote.
I suspect that what ES does is that it writes the data in long sequential
writes, and tries to make it so that if there is power loss, logs will be
lost but not corrupted. It can do that at the data rates that you are
describing. It's writing hundreds, if not thousands of logs per
'transaction'
this is probably acceptable, but you do need to be aware of the tradeoff.
Right, there are always trade-offs. I'm sorry if I came across as the
"you're using the wrong technology" guy. I hate it when people do that.
In this particular case, I understand it's only about aggregating logs and
searching them afterwards instead of doing that with straight files. And
this is exactly what ES is about, so I thought it would be easier/better
to
give it a shot. And I don't see write speed as being its strong point,
either - that would be the search speed.
I think that you are correct in saying that ES is better than MySQL for
this, but I was wanting to point out that the reason why MySQL is as slow
as he was seeing is because it's making sure that each transaction is safe
before proceeding.
Relaxing this guarantee is the sort of thing that all the No-SQL databases
do, and most of their performance wins are possible only because they do
not provide the same guarantees that the traditional SQL databases provide.
David Lang
______________________________**_________________
rsyslog mailing list
http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.