Re: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd)

Rainer Gerhards Thu, 23 Apr 2009 23:23:56 -0700

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of [email protected]
> Sent: Friday, April 24, 2009 7:57 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> loginsertion(fwd)
> 
> On Fri, 24 Apr 2009, Rainer Gerhards wrote:
> 
> > Another innocent question:
> >
> > Let's say I used an exec() API exclusively. Now let me assume that I
> do, on
> > the *same* database connection, this calling sequence:
> >
> > exec("begin transaction")
> > exec("insert ...")
> > exec("insert ...")
> > exec("insert ...")
> > exec("insert ...")
> > exec("insert ...")
> > exec("insert ...")   [Point A]
> > exec("commit")
> >
> > Is it safe to assume that this will result in a performance benefit
> (I know
> > that it causes more network traffic than necessary, but that's not my
> point -
> > I just talk of speedup). Will this performance speedup be
> considerable (along
> > the magnitude of 20 vs. 3 seconds for a given sequence?).
> 
> Yes, this speedup would be considerable
> 
> from the message at the bottom it would be on the order of
> 
> >>> separate inserts, no transaction: 21.21s
> >>> separate inserts, same transaction: 1.89s


I read this, just wanted some reconfirmation.

> 
> there is still another order of magnatude gain to be had by going to
> the
> copy (and eliminating the extra round trips)
> 
> >>> COPY (text): 0.10s

Definitely, but let's tackle the 90% issue first.

> 
> a copy looks something like
> 
> copy to table X from STDIN
> data
> data
> data
> 
> 
> > Also, even more importantly, does this really many they are all in
> one
> > transaction?
> 
> yes.
> 
> > In particular, what happens if the connection breaks at [Point
> > A], e.g. by the network connection going down for an extended period
> of time.
> > Is it safe to assume that then everything will be rolled back?
> 
> yes, every one of them would dissappear.
> 

So it looks my three-call (beginBatch, pushData, EndBatch) calling interface
can probably work. I need to work on how non-transactional outputs can convey
what they have commited, but the basic interface looks rather good.

Rainer

> David Lang
> 
> > Feedback is appreciated.
> >
> > Rainer
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:rsyslog-
> >> [email protected]] On Behalf Of Rainer Gerhards
> >> Sent: Thursday, April 23, 2009 4:38 PM
> >> To: rsyslog-users
> >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> >> loginsertion(fwd)
> >>
> >> That's interesting. As a side-activity, I am thinking about a new
> >> output
> >> module interface. Especially given the discussion on the postgres
> list,
> >> but
> >> also some other thoughts about other modules (e.g. omtcp or the file
> >> output),
> >> I tend to use an approach that permits both string-based as well as
> >> API-based
> >> (API as in libpq) ways of doing things. I have not really designed
> >> anything,
> >> but the rough idea is that each plugin needs three entry points:
> >>
> >> - start batch
> >> - process single message
> >> - end batch
> >>
> >> Then, the plugin can decide itself what it wants to do and when.
> Most
> >> importantly, this calling interface works well for string-based
> >> transactions
> >> as well as API-based ones.
> >>
> >> For the output file writer, for example, I envision that over time
> it
> >> will
> >> have its own write buffer (for various reasons, for example I am
> also
> >> discussing zipped writing with some folks). With this interface, I
> can
> >> put
> >> everything into the buffer, write out if needed but not if there is
> no
> >> immediate need but I can make sure that I write out when the "end
> >> batch"
> >> entry point is called.
> >>
> >> As I said, it is not really thought out yet, but maybe a starting
> >> point. So
> >> feedback is appreciated.
> >>
> >> Rainer
> >>
> >>> -----Original Message-----
> >>> From: [email protected] [mailto:rsyslog-
> >>> [email protected]] On Behalf Of [email protected]
> >>> Sent: Wednesday, April 22, 2009 10:11 PM
> >>> To: rsyslog-users
> >>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log
> >>> insertion(fwd)
> >>>
> >>> from the postgres performance mailing list, relative speeds of
> >>> different
> >>> ways of inserting data.
> >>>
> >>> I've asked if the 'seperate inserts' mode is seperate round trips
> or
> >>> many
> >>> inserts in one round trip.
> >>>
> >>> based on this it looks like prepared statements make a difference,
> >> but
> >>> not
> >>> so much that other techniques (either a single statement or a copy)
> >>> aren't
> >>> comparable (or better) options.
> >>>
> >>> David Lang
> >>>
> >>> ---------- Forwarded message ----------
> >>> Date: Wed, 22 Apr 2009 15:33:21 -0400
> >>> From: Glenn Maynard <[email protected]>
> >>> To: [email protected]
> >>> Subject: Re: [PERFORM] performance for high-volume log insertion
> >>>
> >>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <[email protected]>
> >>> wrote:
> >>>> Yes, as I beleive was mentioned already, planning time for inserts
> >> is
> >>>> really small.  Parsing time for inserts when there's little
> parsing
> >>> that
> >>>> has to happen also isn't all *that* expensive and the same goes
> for
> >>>> conversions from textual representations of data to binary.
> >>>>
> >>>> We're starting to re-hash things, in my view.  The low-hanging
> >> fruit
> >>> is
> >>>> doing multiple things in a single transaction, either by using
> >> COPY,
> >>>> multi-value INSERTs, or just multiple INSERTs in a single
> >>> transaction.
> >>>> That's absolutely step one.
> >>>
> >>> This is all well-known, covered information, but perhaps some
> numbers
> >>> will help drive this home.  40000 inserts into a single-column,
> >>> unindexed table; with predictable results:
> >>>
> >>> separate inserts, no transaction: 21.21s
> >>> separate inserts, same transaction: 1.89s
> >>> 40 inserts, 100 rows/insert: 0.18s
> >>> one 40000-value insert: 0.16s
> >>> 40 prepared inserts, 100 rows/insert: 0.15s
> >>> COPY (text): 0.10s
> >>> COPY (binary): 0.10s
> >>>
> >>> Of course, real workloads will change the weights, but this is more
> >> or
> >>> less the magnitude of difference I always see--batch your inserts
> >> into
> >>> single statements, and if that's not enough, skip to COPY.
> >>>
> >>> --
> >>> Glenn Maynard
> >>>
> >>> --
> >>> Sent via pgsql-performance mailing list (pgsql-
> >>> [email protected])
> >>> To make changes to your subscription:
> >>> http://www.postgresql.org/mailpref/pgsql-performance
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd)

Reply via email to