Hi Ken, glad to have you here. I am a bit silent at the moment, because I am not a real database guy and so I am primarily listening to any information that is incoming. If you have a couple of minutes, it would be useful to review this thread here:
http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html ...one comment inline below... > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of Kenneth Marshall > Sent: Tuesday, April 21, 2009 7:52 PM > To: [email protected] > Cc: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > insertion > > David, > > Okay, I am now subscribed to the mailing list. We are currently using > rsyslog-3.20.x. As far as implementing a prototype of the prepared > statement, I was sidetracked by other duties and have not had a chance > to do anything but an initial evaluation. As far as the rsyslog > internal > escaping, it looked simplest to create another template type like the > current SQL and STDSQL that indicated that escaping was not needed > and/or that prepared statements should be used. To disable escaping, simply do not use SQL or STDSQL. However, the db outputs currently require this option (easy to disable), because I cannot see how it will work (with the existing code) without escaping. Any idea is most welcome. Rainer > > Regards, > Ken > > On Tue, Apr 21, 2009 at 08:51:37AM -0700, [email protected] wrote: > > On Tue, 21 Apr 2009, Kenneth Marshall wrote: > > > >> On Tue, Apr 21, 2009 at 08:37:54AM -0700, [email protected] wrote: > >>> Kenneth, > >>> could you join the discussion on the rsyslog mailing list? > >>> rsyslog-users <[email protected]> > >>> > >>> I'm surprised to hear you say that rsyslog can already do batch > inserts > >>> and > >>> am interested in how you did that. > >>> > >>> what sort of insert rate did you mange to get? > >>> > >>> David Lang > >>> > >> David, > >> > >> I would be happy to join the discussion. I did not mean to say > >> that rsyslog currently supported batch inserts, just that the > >> pieces that provide "stand-by queuing" could be used to manage > >> batching inserts. > > > > I've changed the to list to the rsyslog users list. > > > > currently the stand-by queuing still handles messages one at a time. > > however a sponser has been found to pay to changing the rsyslog > internals > > to allow for multiple messages to be handled at once, which is what > > triggered some of this discussion. > > > > which version of rsyslog are you working with? > > > > when you modified rsyslog to do prepared statement (to avoid the > escaping > > and parsing) did you hard-code the prepared statement? what other > changes > > did you make? > > > > David Lang > > > >> Cheers, > >> Ken > >> > >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>> > >>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 > >>>> From: Kenneth Marshall <[email protected]> > >>>> To: Richard Huxton <[email protected]> > >>>> Cc: [email protected], Stephen Frost <[email protected]>, > >>>> Greg Smith <[email protected]>, pgsql- > [email protected] > >>>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>>> Hi, > >>>> > >>>> I just finished reading this thread. We are currently working on > >>>> setting up a central log system using rsyslog and PostgreSQL. It > >>>> works well once we patched the memory leak. We also looked at what > >>>> could be done to improve the efficiency of the DB interface. On > the > >>>> rsyslog side, moving to prepared queries allows you to remove the > >>>> escaping that needs to be done currently before attempting to > >>>> insert the data into the SQL backend as well as removing the > parsing > >>>> and planning time from the insert. This is a big win for high > insert > >>>> rates, which is what we are talking about. The escaping process is > >>>> also a big CPU user in rsyslog which then hands the escaped string > >>>> to the backend which then has to undo everything that had been > done > >>>> and parse/plan the resulting query. This can use a surprising > amount > >>>> of additional CPU. Even if you cannot support a general prepared > >>>> query interface, by specifying what the query should look like you > >>>> can handle much of the low-hanging fruit query-wise. > >>>> > >>>> We are currently using a date based trigger to use a new partition > >>>> each day and keep 2 months of logs currently. This can be usefully > >>>> managed on the backend database, but if rsyslog supported changing > >>>> the insert to the new table on a time basis, the CPU used by the > >>>> trigger to support this on the backend could be reclaimed. This > >>>> would be a win for any DB backend. As you move to the new > partition, > >>>> issuing a truncate to clear the table would simplify the DB > interfaces. > >>>> > >>>> Another performance enhancement already mentioned, would be to > >>>> allow certain extra fields in the DB to be automatically populated > >>>> as a function of the log messages. For example, logging the mail > queue > >>>> id for messages from mail systems would make it much easier to > locate > >>>> particular mail transactions in large amounts of data. > >>>> > >>>> To sum up, eliminating the escaping in rsyslog through the use of > >>>> prepared queries would reduce the CPU load on the DB backend. > Batching > >>>> the inserts will also net you a big performance increase. Some DB- > based > >>>> applications allow for the specification of several types of > queries, > >>>> one for single inserts and then a second to support multiple > inserts > >>>> (copy). Rsyslog already supports the queuing pieces to allow you > to > >>>> batch inserts. Just some ideas. > >>>> > >>>> Regards, > >>>> Ken > >>>> > >>>> > >>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: > >>>>> [email protected] wrote: > >>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: > >>>>>>> * [email protected] ([email protected]) wrote: > >>>>>>>> while I fully understand the 'benchmark your situation' need, > this > >>>>>>>> isn't > >>>>>>>> that simple. > >>>>>>> > >>>>>>> It really is. You know your application, you know it's primary > use > >>>>>>> cases, and probably have some data to play with. You're > certainly in > >>>>>>> a > >>>>>>> much better situation to at least *try* and benchmark it than > we are. > >>>>>> rsyslog is a syslog server. it replaces (or for debian and > fedora, has > >>>>>> replaced) your standard syslog daemon. it recieves log messages > from > >>>>>> every > >>>>>> app on your system (and possibly others), filters, maniulates > them, > >>>>>> and > >>>>>> then stores them somewhere. among the places that it can store > the > >>>>>> logs > >>>>>> are database servers (native support for MySQL, PostgreSQL, and > >>>>>> Oracle. > >>>>>> plus libdbi for others) > >>>>> > >>>>> Well, from a performance standpoint the obvious things to do are: > >>>>> 1. Keep a connection open, do NOT reconnect for each log- > statement > >>>>> 2. Batch log statements together where possible > >>>>> 3. Use prepared statements > >>>>> 4. Partition the tables by day/week/month/year (configurable I > suppose) > >>>>> > >>>>> The first two are vital, the third takes you a step further. The > fourth > >>>>> is > >>>>> a long-term admin thing. > >>>>> > >>>>> And possibly > >>>>> 5. Have two connections, one for fatal/error etc and one for > info/debug > >>>>> level log statements (configurable split?). Then you can use the > >>>>> synchronous_commit setting on the less important ones. Might buy > you > >>>>> some > >>>>> performance on a busy system. > >>>>> > >>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- > wal.html#RUNTIME-CONFIG-WAL-SETTINGS > >>>>> > >>>>>> other apps then search and report on the data after it is > stored. what > >>>>>> apps?, I don't know either. pick your favorite reporting tool > and > >>>>>> you'll > >>>>>> be a step ahead of me (I don't know a really good reporting > tool) > >>>>>> as for sample data, you have syslog messages, just like I do. so > you > >>>>>> have > >>>>>> the same access to data that I have. > >>>>>> how would you want to query them? how would people far less > >>>>>> experianced > >>>>>> that you want to query them? > >>>>>> I can speculate that some people would do two columns (time, > >>>>>> everything > >>>>>> else), others will do three (time, server, everything else), and > >>>>>> others > >>>>>> will go further (I know some who would like to extract IP > addresses > >>>>>> embedded in a message into their own column). some people will > index > >>>>>> on > >>>>>> the time and host, others will want to do full-text searches of > >>>>>> everything. > >>>>> > >>>>> Well, assuming it looks much like traditional syslog, I would do > >>>>> something > >>>>> like: (timestamp, host, facility, priority, message). It's easy > enough > >>>>> to > >>>>> stitch back together if people want that. > >>>>> > >>>>> PostgreSQL's full-text indexing is quite well suited to logfiles > I'd > >>>>> have > >>>>> thought, since it knows about filenames, urls etc already. > >>>>> > >>>>> If you want to get fancy, add a msg_type column and one > subsidiary > >>>>> table > >>>>> for each msg_type. So - you might have smtp_connect_from > (hostname, > >>>>> ip_addr). A set of perl regexps can match and extract the fields > for > >>>>> these > >>>>> extra tables, or you could do it with triggers inside the > database. I > >>>>> think > >>>>> it makes sense to do it in the application. Easier for users to > >>>>> contribute > >>>>> new patterns/extractions. Meanwhile, the core table is untouched > so you > >>>>> don't *need* to know about these extra tables. > >>>>> > >>>>> If you have subsidiary tables, you'll want to partition those too > and > >>>>> perhaps stick them in their own schema (logs200901, logs200902 > etc). > >>>>> > >>>>> -- > >>>>> Richard Huxton > >>>>> Archonet Ltd > >>>>> > >>>>> -- > >>>>> Sent via pgsql-performance mailing list > >>>>> ([email protected]) > >>>>> To make changes to your subscription: > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>> > >>>> > >>> > >>> -- > >>> Sent via pgsql-performance mailing list > >>> ([email protected]) > >>> To make changes to your subscription: > >>> http://www.postgresql.org/mailpref/pgsql-performance > >>> > >> > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

