Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)

Peter Eisentraut Tue, 16 Apr 2002 10:04:38 -0700

Tom Lane writes:

> The regression tests contain no very-long literals.  The results I was
> referring to concerned cases with string (BLOB) literals in the
> hundreds-of-K range; it seems that the per-character loop in the flex
> lexer starts to look like a bottleneck when you have tokens that much
> larger than the rest of the query.
>
> Solutions seem to be either (a) make that loop quicker, or (b) find a
> way to avoid passing BLOBs through the lexer.  I was merely suggesting
> that (a) should be investigated before we invest the work implied
> by (b).


I've done the following test:  Ten statements of the form

SELECT 1 FROM tab1 WHERE val = '...';

where ... are literals of length 5 - 10 MB (some random base-64 encoded
MP3 files).  "tab1" was empty.  The test ran 3:40 min wall-clock time.

Top ten calls:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 36.95      9.87     9.87 74882482     0.00     0.00  pq_getbyte
 22.80     15.96     6.09       11   553.64  1450.93  pq_getstring
 13.55     19.58     3.62       11   329.09   329.10  scanstr
 12.09     22.81     3.23      110    29.36    86.00  base_yylex
  4.27     23.95     1.14       34    33.53    33.53  yy_get_previous_state
  3.86     24.98     1.03       22    46.82    46.83  textin
  3.67     25.96     0.98       34    28.82    28.82  myinput
  1.83     26.45     0.49       45    10.89    32.67  yy_get_next_buffer
  0.11     26.48     0.03     3027     0.01     0.01  AllocSetAlloc
  0.11     26.51     0.03      129     0.23     0.23  fmgr_isbuiltin

The string literals didn't contain any backslashes, so scanstr is
operating in the best-case scenario here.  But for arbitary binary data we
need some escape mechanism, so I don't see much room for improvement
there.

It seems the real bottleneck is the excessive abstraction in the
communications layer.  I haven't looked closely at all, but it would seem
better if pq_getstring would not use pq_getbyte and instead read the
buffer directly.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)

Reply via email to