> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Wednesday, April 22, 2009 8:26 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > insertion > > On Wed, 22 Apr 2009, Rainer Gerhards wrote: > > > Hi Ken, > > > > glad to have you here. I am a bit silent at the moment, because I am > not a > > real database guy and so I am primarily listening to any information > that is > > incoming. If you have a couple of minutes, it would be useful to > review this > > thread here: > > > > http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html > > > > ...one comment inline below... > > > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of Kenneth Marshall > >> > >> David, > >> > >> Okay, I am now subscribed to the mailing list. We are currently > using > >> rsyslog-3.20.x. As far as implementing a prototype of the prepared > >> statement, I was sidetracked by other duties and have not had a > chance > >> to do anything but an initial evaluation. As far as the rsyslog > >> internal > >> escaping, it looked simplest to create another template type like > the > >> current SQL and STDSQL that indicated that escaping was not needed > >> and/or that prepared statements should be used. > > > > To disable escaping, simply do not use SQL or STDSQL. However, the db > outputs > > currently require this option (easy to disable), because I cannot see > how it > > will work (with the existing code) without escaping. > > > > Any idea is most welcome. > > when using prepared statement escaping is not needed.
... but that can only be on the non-text API level (e.g. by using libpq). On the SQL text level, I have no idea how to tell the sql engine to insert ' - if I don't say '''' but ''' how does the engine know what I say? With the C-level API, I bind a parameter and specify a buffer and then put my character into that buffer - no escaping needed for sure. But, again, I do not see how this would work on the text level... > according to > Ken's > message below he found that the overhead of doing the escaping was > significant. I don't see why this should be the case, but if it > requires > making an extra copy of the string I guess it's possible > I am surprised, too. Even if a copy is made (I think it is), this is a quick in-memory operation. Given the rest of the picture, I would expect that to have very low impact on the overall cost (just think about the need to copy buffers between different contexts, down to different layers, etc - so I'd expect to see ample copy operations before the data finally hits the disk). Anyhow, I may be totally wrong... > I plan to get a setup togeather in the next couple of days that will > let > me do some testing of the options on the database side. > That would be great. I, for now, intend to look at the queue first. I have begun to thought about steps on how to tackle the beast. So I will probably not do much more on the database level than throw in some thoughts (but not do any testing or coding). Rainer > David Lang > > > Rainer > >> > >> Regards, > >> Ken > >> > >> On Tue, Apr 21, 2009 at 08:51:37AM -0700, [email protected] wrote: > >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>> > >>>> On Tue, Apr 21, 2009 at 08:37:54AM -0700, [email protected] wrote: > >>>>> Kenneth, > >>>>> could you join the discussion on the rsyslog mailing list? > >>>>> rsyslog-users <[email protected]> > >>>>> > >>>>> I'm surprised to hear you say that rsyslog can already do batch > >> inserts > >>>>> and > >>>>> am interested in how you did that. > >>>>> > >>>>> what sort of insert rate did you mange to get? > >>>>> > >>>>> David Lang > >>>>> > >>>> David, > >>>> > >>>> I would be happy to join the discussion. I did not mean to say > >>>> that rsyslog currently supported batch inserts, just that the > >>>> pieces that provide "stand-by queuing" could be used to manage > >>>> batching inserts. > >>> > >>> I've changed the to list to the rsyslog users list. > >>> > >>> currently the stand-by queuing still handles messages one at a > time. > >>> however a sponser has been found to pay to changing the rsyslog > >> internals > >>> to allow for multiple messages to be handled at once, which is what > >>> triggered some of this discussion. > >>> > >>> which version of rsyslog are you working with? > >>> > >>> when you modified rsyslog to do prepared statement (to avoid the > >> escaping > >>> and parsing) did you hard-code the prepared statement? what other > >> changes > >>> did you make? > >>> > >>> David Lang > >>> > >>>> Cheers, > >>>> Ken > >>>> > >>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>>>> > >>>>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 > >>>>>> From: Kenneth Marshall <[email protected]> > >>>>>> To: Richard Huxton <[email protected]> > >>>>>> Cc: [email protected], Stephen Frost <[email protected]>, > >>>>>> Greg Smith <[email protected]>, pgsql- > >> [email protected] > >>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>>>>> Hi, > >>>>>> > >>>>>> I just finished reading this thread. We are currently working on > >>>>>> setting up a central log system using rsyslog and PostgreSQL. It > >>>>>> works well once we patched the memory leak. We also looked at > what > >>>>>> could be done to improve the efficiency of the DB interface. On > >> the > >>>>>> rsyslog side, moving to prepared queries allows you to remove > the > >>>>>> escaping that needs to be done currently before attempting to > >>>>>> insert the data into the SQL backend as well as removing the > >> parsing > >>>>>> and planning time from the insert. This is a big win for high > >> insert > >>>>>> rates, which is what we are talking about. The escaping process > is > >>>>>> also a big CPU user in rsyslog which then hands the escaped > string > >>>>>> to the backend which then has to undo everything that had been > >> done > >>>>>> and parse/plan the resulting query. This can use a surprising > >> amount > >>>>>> of additional CPU. Even if you cannot support a general prepared > >>>>>> query interface, by specifying what the query should look like > you > >>>>>> can handle much of the low-hanging fruit query-wise. > >>>>>> > >>>>>> We are currently using a date based trigger to use a new > partition > >>>>>> each day and keep 2 months of logs currently. This can be > usefully > >>>>>> managed on the backend database, but if rsyslog supported > changing > >>>>>> the insert to the new table on a time basis, the CPU used by the > >>>>>> trigger to support this on the backend could be reclaimed. This > >>>>>> would be a win for any DB backend. As you move to the new > >> partition, > >>>>>> issuing a truncate to clear the table would simplify the DB > >> interfaces. > >>>>>> > >>>>>> Another performance enhancement already mentioned, would be to > >>>>>> allow certain extra fields in the DB to be automatically > populated > >>>>>> as a function of the log messages. For example, logging the mail > >> queue > >>>>>> id for messages from mail systems would make it much easier to > >> locate > >>>>>> particular mail transactions in large amounts of data. > >>>>>> > >>>>>> To sum up, eliminating the escaping in rsyslog through the use > of > >>>>>> prepared queries would reduce the CPU load on the DB backend. > >> Batching > >>>>>> the inserts will also net you a big performance increase. Some > DB- > >> based > >>>>>> applications allow for the specification of several types of > >> queries, > >>>>>> one for single inserts and then a second to support multiple > >> inserts > >>>>>> (copy). Rsyslog already supports the queuing pieces to allow you > >> to > >>>>>> batch inserts. Just some ideas. > >>>>>> > >>>>>> Regards, > >>>>>> Ken > >>>>>> > >>>>>> > >>>>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: > >>>>>>> [email protected] wrote: > >>>>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: > >>>>>>>>> * [email protected] ([email protected]) wrote: > >>>>>>>>>> while I fully understand the 'benchmark your situation' > need, > >> this > >>>>>>>>>> isn't > >>>>>>>>>> that simple. > >>>>>>>>> > >>>>>>>>> It really is. You know your application, you know it's > primary > >> use > >>>>>>>>> cases, and probably have some data to play with. You're > >> certainly in > >>>>>>>>> a > >>>>>>>>> much better situation to at least *try* and benchmark it than > >> we are. > >>>>>>>> rsyslog is a syslog server. it replaces (or for debian and > >> fedora, has > >>>>>>>> replaced) your standard syslog daemon. it recieves log > messages > >> from > >>>>>>>> every > >>>>>>>> app on your system (and possibly others), filters, maniulates > >> them, > >>>>>>>> and > >>>>>>>> then stores them somewhere. among the places that it can store > >> the > >>>>>>>> logs > >>>>>>>> are database servers (native support for MySQL, PostgreSQL, > and > >>>>>>>> Oracle. > >>>>>>>> plus libdbi for others) > >>>>>>> > >>>>>>> Well, from a performance standpoint the obvious things to do > are: > >>>>>>> 1. Keep a connection open, do NOT reconnect for each log- > >> statement > >>>>>>> 2. Batch log statements together where possible > >>>>>>> 3. Use prepared statements > >>>>>>> 4. Partition the tables by day/week/month/year (configurable I > >> suppose) > >>>>>>> > >>>>>>> The first two are vital, the third takes you a step further. > The > >> fourth > >>>>>>> is > >>>>>>> a long-term admin thing. > >>>>>>> > >>>>>>> And possibly > >>>>>>> 5. Have two connections, one for fatal/error etc and one for > >> info/debug > >>>>>>> level log statements (configurable split?). Then you can use > the > >>>>>>> synchronous_commit setting on the less important ones. Might > buy > >> you > >>>>>>> some > >>>>>>> performance on a busy system. > >>>>>>> > >>>>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- > >> wal.html#RUNTIME-CONFIG-WAL-SETTINGS > >>>>>>> > >>>>>>>> other apps then search and report on the data after it is > >> stored. what > >>>>>>>> apps?, I don't know either. pick your favorite reporting tool > >> and > >>>>>>>> you'll > >>>>>>>> be a step ahead of me (I don't know a really good reporting > >> tool) > >>>>>>>> as for sample data, you have syslog messages, just like I do. > so > >> you > >>>>>>>> have > >>>>>>>> the same access to data that I have. > >>>>>>>> how would you want to query them? how would people far less > >>>>>>>> experianced > >>>>>>>> that you want to query them? > >>>>>>>> I can speculate that some people would do two columns (time, > >>>>>>>> everything > >>>>>>>> else), others will do three (time, server, everything else), > and > >>>>>>>> others > >>>>>>>> will go further (I know some who would like to extract IP > >> addresses > >>>>>>>> embedded in a message into their own column). some people will > >> index > >>>>>>>> on > >>>>>>>> the time and host, others will want to do full-text searches > of > >>>>>>>> everything. > >>>>>>> > >>>>>>> Well, assuming it looks much like traditional syslog, I would > do > >>>>>>> something > >>>>>>> like: (timestamp, host, facility, priority, message). It's easy > >> enough > >>>>>>> to > >>>>>>> stitch back together if people want that. > >>>>>>> > >>>>>>> PostgreSQL's full-text indexing is quite well suited to > logfiles > >> I'd > >>>>>>> have > >>>>>>> thought, since it knows about filenames, urls etc already. > >>>>>>> > >>>>>>> If you want to get fancy, add a msg_type column and one > >> subsidiary > >>>>>>> table > >>>>>>> for each msg_type. So - you might have smtp_connect_from > >> (hostname, > >>>>>>> ip_addr). A set of perl regexps can match and extract the > fields > >> for > >>>>>>> these > >>>>>>> extra tables, or you could do it with triggers inside the > >> database. I > >>>>>>> think > >>>>>>> it makes sense to do it in the application. Easier for users to > >>>>>>> contribute > >>>>>>> new patterns/extractions. Meanwhile, the core table is > untouched > >> so you > >>>>>>> don't *need* to know about these extra tables. > >>>>>>> > >>>>>>> If you have subsidiary tables, you'll want to partition those > too > >> and > >>>>>>> perhaps stick them in their own schema (logs200901, logs200902 > >> etc). > >>>>>>> > >>>>>>> -- > >>>>>>> Richard Huxton > >>>>>>> Archonet Ltd > >>>>>>> > >>>>>>> -- > >>>>>>> Sent via pgsql-performance mailing list > >>>>>>> ([email protected]) > >>>>>>> To make changes to your subscription: > >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> Sent via pgsql-performance mailing list > >>>>> ([email protected]) > >>>>> To make changes to your subscription: > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>> > >>>> > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

