On Mon, 17 Jun 2013, Mahesh V wrote:

Hello Folks,

With rsyslog + mysql configured (rsyslog 7.4, $ActionQueueDequeueBatchSize
1000) and no changes to
mysql config file (/etc/my.cnf) I get a good insert rate, shown below.
mysql -u root -p<...> -e "use Syslog; select message from SystemEvents;" |
wc -l
405330 in about

something got droped from the message here.

Howver, I find that mysqld reaches > 80% CPU Usage and rsyslog (20% CPU).
The system is a 4 core x86_84 centos system.

I am curious to find out if such high cpu usage for mysqld is normal.

that sounds about the right ratio of CPU usage between rsyslog and MySQL

I guess elasticsearch is what I need rather than persisting with mysql.
What kind of performances
can I expect with elasticsearch?

The problem in my application is, it has a heartbeat for every second with
device under test.
If the cpu usage from other entities are high, the heartbeat looses its
rythm and hence the connectivity.

I think the problem is that your system just does not have a good enough disk setup to handle this volume of database work.

Elasticsearch should be a bit better, but if you are maxing out the system it will still give you problems, just at higher insert volumes.

There is a reason why most serious database systems either have a hardware RAID card (wiht battery backed cache) or use a disk array with NVRAM or battery backed cache. It's because the cache/nvram allows the database to have the data safe at much less effort than writing directly to a disk.

As for your problem of loosing heartbeats, take a look at using nice, either to boost the priority of the heartbeat or to lower the priority of the logging/database.

David Lang

thanks,
Mahesh


On Sun, Jun 16, 2013 at 1:50 PM, David Lang <[email protected]> wrote:

On Sun, 16 Jun 2013, Radu Gheorghe wrote:

 2013/6/14 David Lang <[email protected]>

 On Fri, 14 Jun 2013, Radu Gheorghe wrote:

 Hi Mahesh,


If you don't need mysql for a specific reason, I'd suggest you try
thowing
your logs in Elasticsearch. Here's a tutorial:
http://wiki.rsyslog.com/index.****php/HOWTO:_rsyslog_%2B_****
elasticsearch<http://wiki.rsyslog.com/index.**php/HOWTO:_rsyslog_%2B_**elasticsearch>
<http://wiki.**rsyslog.com/index.php/HOWTO:_**rsyslog_%2B_elasticsearch<http://wiki.rsyslog.com/index.php/HOWTO:_rsyslog_%2B_elasticsearch>


I assume you'll get way better insert and query performance than you can
with mysql (ie: with bulks, I get 10-20K logs indexed per second on my
$500
laptop. Then I can query in 100M-200M logs within a second. Depends on
your
settings). Plus, it's super-easy to scale Elasticsearch by adding new
nodes.

For querying, there are several, tools, the most popular being Kibana:
http://three.kibana.org/


Just to note, one of the things that makes MySQL so slow or Mahesh is
it's
safety features. After each insert, MySQL makes sure the data is safe on
disk before it considers the insert complete.



By that, you mean it does a fsync after every transaction? I thought it
doesn't do this (at least not by default, with neither MyISAM nor InnoDB).
But then again, at least InnoDB does it more often than ES does.


I don't remember the table types, but the newer of the two does do fsync
after each transaction, which is how it actually properly supports
transactions. This is why it was such a big deal when MySQL changed the
default.



 If the system crashes, the data will be there. There are config options
to
override this in MySQL.

To get the numbers that elasticsearch is getting on your laptop, it's
almost certinly not doing this.


I assume you lose some data if the whole system suddenly goes down. But if
just ES does (ie: kill -9 the JVM), you shouldn't lose any data.

I think ES writes stuff in a very different way than MySQL does. When you
index something in ES, it does the indexing in memory and writes the raw
data in the transaction
log<http://www.elasticsearch.**org/guide/reference/index-**
modules/translog/<http://www.elasticsearch.org/guide/reference/index-modules/translog/>
.
Only after this is done you get a reply from ES.

The transaction log is replayed on startup in case something goes wrong
and
you lose the data you had in memory. Every once in a while, it writes what
it has to disk in the actual Lucene
index<http://www.**elasticsearch.org/guide/**reference/glossary/#shard<http://www.elasticsearch.org/guide/reference/glossary/#shard>
**where
it stores data "permanently".

These chunks of data that it writes are
segments<https://lucene.**apache.org/core/3_6_2/**
fileformats.html#Segments<https://lucene.apache.org/core/3_6_2/fileformats.html#Segments>
,
which consist of multiple files. The thing about segments is that they're
immutable. And to make sure that you don't end up with a gazzillion
segments, these are asynchronously
merged<http://www.**elasticsearch.org/guide/**
reference/index-modules/merge/<http://www.elasticsearch.org/guide/reference/index-modules/merge/>
**>from
time to time.


the thing is that if it doesn't do a fsync, you have no guarantee that the
data is on the disk. And it's very possible for later data to make it to
the disk before earlier data does.

doing a kill -9 isn't the same as a system crash.

when you do a kill -9 the kernel and filesystem code contain all the data
that the application wrote, and will present that data if asked, and will
eventually get it to disk.

But if the system looses power, any data not actually written to disk is
lost. And (depending on lots of implementation details) it's possible to
end up with holes in files, or files created that have no content, or even
files created, with space allocated for them, but stray data from the drive
in that space, not what the application wrote.


I suspect that what ES does is that it writes the data in long sequential
writes, and tries to make it so that if there is power loss, logs will be
lost but not corrupted. It can do that at the data rates that you are
describing. It's writing hundreds, if not thousands of logs per
'transaction'


 this is probably acceptable, but you do need to be aware of the tradeoff.


Right, there are always trade-offs. I'm sorry if I came across as the
"you're using the wrong technology" guy. I hate it when people do that.

In this particular case, I understand it's only about aggregating logs and
searching them afterwards instead of doing that with straight files. And
this is exactly what ES is about, so I thought it would be easier/better
to
give it a shot. And I don't see write speed as being its strong point,
either - that would be the search speed.


I think that you are correct in saying that ES is better than MySQL for
this, but I was wanting to point out that the reason why MySQL is as slow
as he was seeing is because it's making sure that each transaction is safe
before proceeding.

Relaxing this guarantee is the sort of thing that all the No-SQL databases
do, and most of their performance wins are possible only because they do
not provide the same guarantees that the traditional SQL databases provide.

David Lang

______________________________**_________________
rsyslog mailing list
http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to