Tom Lane writes: > The regression tests contain no very-long literals. The results I was > referring to concerned cases with string (BLOB) literals in the > hundreds-of-K range; it seems that the per-character loop in the flex > lexer starts to look like a bottleneck when you have tokens that much > larger than the rest of the query. > > Solutions seem to be either (a) make that loop quicker, or (b) find a > way to avoid passing BLOBs through the lexer. I was merely suggesting > that (a) should be investigated before we invest the work implied > by (b).
I've done the following test: Ten statements of the form SELECT 1 FROM tab1 WHERE val = '...'; where ... are literals of length 5 - 10 MB (some random base-64 encoded MP3 files). "tab1" was empty. The test ran 3:40 min wall-clock time. Top ten calls: % cumulative self self total time seconds seconds calls ms/call ms/call name 36.95 9.87 9.87 74882482 0.00 0.00 pq_getbyte 22.80 15.96 6.09 11 553.64 1450.93 pq_getstring 13.55 19.58 3.62 11 329.09 329.10 scanstr 12.09 22.81 3.23 110 29.36 86.00 base_yylex 4.27 23.95 1.14 34 33.53 33.53 yy_get_previous_state 3.86 24.98 1.03 22 46.82 46.83 textin 3.67 25.96 0.98 34 28.82 28.82 myinput 1.83 26.45 0.49 45 10.89 32.67 yy_get_next_buffer 0.11 26.48 0.03 3027 0.01 0.01 AllocSetAlloc 0.11 26.51 0.03 129 0.23 0.23 fmgr_isbuiltin The string literals didn't contain any backslashes, so scanstr is operating in the best-case scenario here. But for arbitary binary data we need some escape mechanism, so I don't see much room for improvement there. It seems the real bottleneck is the excessive abstraction in the communications layer. I haven't looked closely at all, but it would seem better if pq_getstring would not use pq_getbyte and instead read the buffer directly. -- Peter Eisentraut [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]