Re: [rsyslog] RFC: On rsyslog output modules and support for batch operations

Luis Fernando Muñoz Mejías Thu, 02 Apr 2009 08:22:06 -0700

RB,

> Oi.  :) Sorry I'm late to the game.


Your contribution is appreciated. :)

> Forgive me - my database-performance-fu and oracle-fu are not terribly
> strong, I may make a fool of myself here.  What is the performance
> gain of making a prepared statement over just executing raw
> statements?

The statement is parsed only once, so you save the overhead of parsing
and doing an execution plan for each execution, which will be
identical. And I expect to insert hundreds of entries per second. :)

All you have to do is pass the arguments.

> CREATE PROCEDURE zazz AS
>     insert into foo(field1, field2, field3) values(:val1, :val2,
> :val3); SET TRANSACTION; zazz("foo", "bar", "baz"); zazz("foo1",
> "bar1", "baz1"); zazz("foo2", "bar2", "baz2"); COMMIT;
>
> -- over
>
> SET TRANSACTION;
> INSERT INTO foo(field1, field2, field3) values("foo",
> "bar", "baz");
> INSERT INTO foo(field1, field2, field3) values("foo1",
> "bar1", "baz1");
> INSERT INTO foo(field1, field2, field3)
> values("foo2", "bar2", "baz2"); COMMIT;
>
With this code, Oracle (any DB, actually) needs to parse each insert,
and then choose the execution plan that looks best once.

What you get by preparing the statement and using batches is that the
client (rsyslog core) will store these triplets:

(foo, bar, baz) (foo1, bar1, baz1) (foo2, bar2, baz2)

and when you've hit a limit (say, you're on (foo1000, bar1000, baz1000))
send them all to the server at once (thus calling only once to doAction,
calling only once to the Oracle interface), who will blindly execute the
statement without wasting a single cycle on parsing or evaluating
execution plans: it's already done.

> Perhaps that's not even what you're doing.

For the moment I'm doing

BEGIN
INSERT INTO foo(field1, field2, field3) values("foo", "bar", "baz");
COMMIT;
BEGIN
INSERT INTO foo(field1, field2, field3) values("foo1", "bar1", "baz1");
COMMIT;

You can already imagine the overhead involved. Actually, all DB-based
modules on rsyslog do the same.

> I know there are other considerations and niceties with procedures,

It's not even a stored procedure, it's on the client doing communicating
many times versus only one with the DB.

> but the latter syntax would still allow for batched transactions while
> enabling rsyslog to do the dirty work of formatting the query and not
> necessitating exposure of internal structures.
>
Indeed, I want rsyslog doing most of the work for me. But the overhead
involved in parsing and evaluating execution plans is unacceptable on my
context. So I'm looking here for the balance between rsyslog doing work
for me and rsyslog performing as good as I need it. Perhaps exposing the
structures is not a good idea, either.

> IMHO, database output modules should still pretty much blindly execute
> whatever SQL rsyslog hands them, be that wrapped in a transaction or
> not.
>
Yes and no. Yes, rsyslog should be the one who tells the statement to be
executed. But there is no need for rsyslog to repeat that statement for
each entry (millions per day). Doing it at initialization time is
enough.

I made a small Python prototype to do something similar to what you
propose, with no batches, but committing each 1000 entries. The speedup
I got by introducing batches was about a factor 50. And the statement
was already prepared.

Cheers.
-- 
Luis Fernando Muñoz Mejías
[email protected]

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] RFC: On rsyslog output modules and support for batch operations

Reply via email to