On Fri, 17 Apr 2009, Luis Fernando Mu?oz Mej?as wrote:

> Hi,
>
>> I am thinking on how to enhance the engine so that fastest-possible
>> database writes (actually, any output) are possible. However, I come
>> across a couple of points. I would like to do so in the most generic
>> way. Let me quote those message parts that I have specific questions
>> on (out of sequence, thus I preserve the full message below - if you
>> need more context).
>>
>> > I made a small Python prototype to do something similar to what you
>> > propose, with no batches, but committing each 1000 entries. The
>> > speedup I got by introducing batches was about a factor 50. And the
>> > statement was already prepared.
>>
>> Could you check what actually brings most of the speedup - the batches
>> or the prepared statement. I am thinking along the lines of using
>> batches but not prepared statements, as in this sample
>>
>> begin insert ...  insert ...  insert ...  insert ...  end
>
> I'll do, but please note that
>
> begin
> execute(unprepared_insert_statement)
> execute(unprepared_insert_statement)
> execute(unprepared_insert_statement)
> execute(unprepared_insert_statement)
> commit
>
> Needs 4 message exchanges with the server. OTOH:
>
> <client>
> push (@batch, $item);
> push (@batch, $item);
> push (@batch, $item);
> push (@batch, $item);
> <send to server>
> begin
> execute_many (insert_statement, @batch)
> commit
>
> Requires only one, so the network overhead is *way* smaller. This is
> true not only of Oracle, but also of PostgreSQL, and I suppose MySQL
> provides similar API.

as a strawman, and thinking of databases in general (not any particular 
database), I see the needs for the database interface as being able to be 
generisized down to a set of config variables something like

DBtype
   value:   one of "oracle|postgres|mysql|libdbi"
   purpose: determine which low-level communication library is used to talk 
to the DB


DBinit
   value:   string
   purpose: any initialization that needs to be done when first connecting 
to the database (sanity checks to make sure the DB has the correct schema, 
initializing sql functions, authentication, etc)

DBstart
   value:   string
   purpose: fixed text ahead of any message content

DBjoin
   value:   string
   purpose: fixed text used to join two messages togeather

DBend
   value:   string
   purpose: fixed text used to end a message to the server

DBmessage
   value:   rsyslog template
   purpose: format an individual message for the database


examples

example 1 existing single-message handling

DBinit=""
DBstart=""
DBjoin=""
DBend=""
DBmessage="insert into table logs values ('$server','$timestamp','$msg');"

resulting statement

insert into table logs values ('server1','$timestamp',$'msg');

example 2 prepared statement

DBinit=""
DBstart=""
DBjoin="\n"
DBend="begin; execute_many (insert_statement, @batch); commit"
DBmessage="push (@batch, '$item');"

resulting statement

push (@batch, 'item1');
push (@batch, 'item2');
push (@batch, 'item3');
push (@batch, 'item4');
begin; execute_many (insert_statement, @batch); commit


example 3 multiple inserts in one statment

DBinit=""
DBstart="insert into table logs values "
DBjoin=", "
DBend=";"
DBmessage="('$server','$timestamp','$msg')"

resulting statement

insert into table logs values ('server1','time1','message1'), 
('server2','time2','message2'), ('server3','time3','message3'), 
('server4','time4','message4');


example 4 multiple inserts in one transaction

DBinit=""
DBstart="begin;\n"
DBjoin="\n"
DBend="\ncommit;"
DBmessage="insert into table logs values ('$server','$time','$message'); "

resulting statement

begin;
insert into table logs values ('server1','time1','message1');
insert into table logs values ('server2','time2','message2');
insert into table logs values ('server3','time3','message3');
insert into table logs values ('server4','time4','message4');
commit;


I don't happen to know the syntax to define a stored procedure off the top 
of my head or I would give you an example of that (which would use the 
DBinit to define the stored procedure)

postgres has a 'copy' command, where you tell it that you are going to 
follow with many lines of content to insert (which is significantly faster 
than insert statements, even batched up)

I believe that this 5-variable set can handle just about every variation 
in putting things in the database, and as such would allow the database 
drivers themselves to be greatly simplified.


thoughts?

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to