Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james
I haven't tried quex, but I have tried lemon (which can be broken out of 
SQLite) and re2c and ragel.


I like ragel and lemon, but the combination supports a push-parser style 
from memory, and many tools are inconvenient unless you are prepared to 
suck in a whole message before parsing, or let the parser drive a pull 
loop, or use a coroutine structure.


Could go all trendy and use a PEG tool like, er,, peg 
(http://piumarta.com/software/peg/).  (I haven't tried them tho')


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james

Doesn't that imply that a plan cache might be worthwhile?

But no matter: didn't the OP really have issue with packaging and 
Windows support - and there are a lot of Windows users, and in general 
there are many Windows devs: making it easier for them to contribute has 
to be good doesn't it?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread Daniel Farina
On Thu, May 3, 2012 at 12:51 PM, james ja...@mansionfamily.plus.com wrote:
 I haven't tried quex, but I have tried lemon (which can be broken out of
 SQLite) and re2c and ragel.

 I like ragel and lemon, but the combination supports a push-parser style
 from memory, and many tools are inconvenient unless you are prepared to suck
 in a whole message before parsing, or let the parser drive a pull loop, or
 use a coroutine structure.

 Could go all trendy and use a PEG tool like, er,, peg
 (http://piumarta.com/software/peg/).  (I haven't tried them tho')

I think the goal is not trendy nor easy to use (but easy to maintain,
at least...), but faster, and even then there is some doubt if any
amount of lexer optimization could possibly matter given everything
else that needs to happen to execute a query.  Better error messages
(with position information) might be a functional enhancement that I'd
like, but I don't think flex is limiting in that regard; rather, a lot
more information already exposed by flex would have to be passed
through the semantic analyzer.

Provided it could matter, are these tools faster than flex? My
limited understanding is probably not.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james
I believe there are tools that are significantly faster than flex.  I 
believe re2c generates code that is faster.  But the key thing is to 
test, probably, or perhaps ask around.  I'm out of touch, but from 
memory flex wasn't the be-all and end-all.


Lemon is definitely easy to maintain/port and the result is pretty nice, 
too (I know Bison/Yacc wasn't the focus here).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread Jeff Janes
On Thu, May 3, 2012 at 12:53 PM, james ja...@mansionfamily.plus.com wrote:
 Doesn't that imply that a plan cache might be worthwhile?

PG has one if you use prepared statements.  Some people are either
unable or unwilling to use prepared statements, though.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Peter Geoghegan
On 2 May 2012 04:24, Robert Haas robertmh...@gmail.com wrote:
 I think Tom's question of whether the parser or lexer is the problem
 is something that ought to be investigated.  Personally, I suspect
 that our tendency to use lists everywhere, for everything, an artifact
 of our proud LISP heritage, may be costing us dearly in terms of parse
 time.  However, there's a difference between suspicion and proof, and
 I certainly have none of the latter.

It's funny that you should say that, because I actually had a
discussion with Greg Stark over drinks about this recently. John
Bentley has described an experiment that undermines many traditional
beliefs about the trade-offs represented by using a linked list rather
than an array. The test is, using a modern computer, generate N
integers at random and insert them in order into a sequence. Then,
randomly remove integers from the sequence, one at a time. What is the
performance at different sizes of N when the sequence is a
doubly-linked list, and when it is an array? If you graph the two, the
results are perhaps rather surprising. I think his graph went up to
100,000. The initial result shows a line representing an array down
near the bottom of the graph. The list line looks exponential. Even if
you use memory pooling so the list doesn't have to allocate memory as
needed, the array still roundly beats the list, albeit not quite so
convincingly and without the list hitting the roof near 100,000. I
don't think that I need to point out that this is for inserting and
deleting, and that's when you're supposed to use lists.

The point is that on modern architectures, with many layers of cache,
the cost of the linear search to get the insertion point completely
dominates - this is without the array availing of a binary search, in
the interest of fairness. CPU caches happen to do a very good job of
moving over on-average half of the array for inserting elements at
random points. The list is much larger than the array, with the two
extra pointers per element (yeah, I know that we use singly linked
lists, but we have other disadvantages compared to typical C lists),
which matters. Predictable usage patterns - that is, temporal and
spatial locality, resulting in good usage of the memory hierarchy
matters a lot. We're not talking about a small difference, either. I
think the difference in the published test was something like the
array was 50 - 100 times faster. The list results in far more cache
misses than the array. So, I'm right there with you - using lists
everywhere is bad news.

As for the question of Flex/Quex, I'm not in an immediate position to
sink any more time into it, but it remains on my list of things to
pursue for 9.3, though it's only about number 3 on that list right
now.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Peter Geoghegan
On 2 May 2012 04:57, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, I think only developers not packagers would really be taking such
 a hit.  I assume we'd continue to ship prebuilt lexer output in
 tarballs, so there'd seldom be a reason for a packager to need to run
 the tool.  Given the extremely slow rate of churn of the lexer, it might
 not be necessary for most developers to have the tool installed, either,
 if we were willing to put the derived file into git.

Incidentally, I had an unrelated conversation with someone (I think it
might have been Heikki) a while back, where it was suggested that Flex
and Bison could be run through web services. This might actually make
hacking Postgres on windows far easier, because the last time I tried
to do that the hard way, I gave up, suspecting that there must be some
kind of Winflex bug that selectively manifests itself - certainly, the
population of windows hackers is small enough that it could go
unnoticed for quite a while. Such an approach could be part of the
solution to this problem (although, incidentally, Quex maintains
visual studio support quite well, and even has graphical instructions
here: http://quex.sourceforge.net/doc/html/intro/visual_studio_trouble.html
).

It might be the case that some kind of virtualisation and/or
authentication (Postgres community account required) could make this
approach practical. It just isn't the path of least resistance right
now. Visual studio builds would be far easier if we did this, which
might encourage more hackers to venture into Windows land.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Jeff Janes
On Tue, May 1, 2012 at 5:53 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Quite apart from the practical difficulties that we have with Flex
 (the fact that the authors are non-responsive and possibly retired,
 that annoying compiler warning, and the fact that we are forced to
 maintain our own Windows binaries of 2.5.35), it has some notable
 disadvantages. I am aware that the MySQL people use their own
 hand-coded lexical analyzer named sql_lex.cc, which provides a yacc
 interface, while avoiding using any lexical analyzer generator
 whatsoever. They can't have done this just for fun, and no doubt this
 goes some way to explaining their continued performance advantage for
 very simple queries. I have heard people complain about Postgres
 parser overhead for pgbench -S style use-cases where it is
 unreasonably high, and I've heard them do so more than once.

For -S -M simple, the time spent planning is 5 times more than the
time spent parsing.  It may be worthwhile to reduce the time spent
parsing, but if the goal is parity with MySQL it probably isn't the
place to start.

(If you use a bottom-up profiler, the time spent in planning is
scattered over so many different functions that none of them look very
important individually)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Peter Geoghegan
On 2 May 2012 17:20, Jeff Janes jeff.ja...@gmail.com wrote:
 For -S -M simple, the time spent planning is 5 times more than the
 time spent parsing.  It may be worthwhile to reduce the time spent
 parsing, but if the goal is parity with MySQL it probably isn't the
 place to start.

Could you please share your figures and methodology? I've heard of far
larger proportions than that.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Jeff Janes
On Wed, May 2, 2012 at 9:31 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 2 May 2012 17:20, Jeff Janes jeff.ja...@gmail.com wrote:
 For -S -M simple, the time spent planning is 5 times more than the
 time spent parsing.  It may be worthwhile to reduce the time spent
 parsing, but if the goal is parity with MySQL it probably isn't the
 place to start.

 Could you please share your figures and methodology? I've heard of far
 larger proportions than that.

I used two methods.  One was to hack exec_simple_query so that, under
the control of a new GUC, it would do things such as pg_parse_query
the query 101 times, throwing away the results of the first 100 before
proceeding to use the 101 parse result as normal.  Then I just run
pgbench under both settings, take the difference in 1/TPS between them
and divide by 100 to get the seconds per parse (and multiple by 1e6 to
get usec/parse)

I did the same thing for pg_analyze_and_rewrite, and for
pg_analyze_and_rewrite+pg_plan_queries (pg_plan_queries scribbles on
the structure produced by pg_analyze_and_rewrite, so you have to
repeat both as a unit, and then subtract the the
pg_analyze_and_rewrite timings off afterwards to isolate just the
planning) .

On my current laptop and rebased to git HEAD, I got
2usec/pg_parse_query, 2usec/pg_analyze_and_rewrite, and
12usec/pg_plan_queries.   Since my laptop is dual core, I did this
with -c2 -j2.

Back when I originally implemented and tested this on much older
hardware and about one year older pgsql code base, the absolute values
of usec/action were several fold higher, but the ratios of 1:1:6 were
about the same.

This does risk that it will overlook caching effects by repeating the
same thing in a tight loop.  I.e. parses 2 through 101 might be much
faster than parse 1 was.  Also, it risks that I simply don't know what
I'm doing and my attempts to throw away the results of a parse are
misbegotten--I just overwrote the old pointer with the new one and
assume the memory context would clean up the resulting orphans.

I could try to clean up and post the patch that implements this if you want.

The second method was just to do --enable-profiling on a stock build
and look at the call graph section of gprof output.  It attributed 50%
to pg_plan_queries and children and about 10% to each of
pg_parse_query and pg_analyze_and_rewrite (including their respective
children).  I don't put tremendous faith in gprof's call graph, but
the fact that the results are in agreement with the other method gives
me more confidence in both.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Magnus Hagander
On Wed, May 2, 2012 at 3:33 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 2 May 2012 04:57, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, I think only developers not packagers would really be taking such
 a hit.  I assume we'd continue to ship prebuilt lexer output in
 tarballs, so there'd seldom be a reason for a packager to need to run
 the tool.  Given the extremely slow rate of churn of the lexer, it might
 not be necessary for most developers to have the tool installed, either,
 if we were willing to put the derived file into git.

 Incidentally, I had an unrelated conversation with someone (I think it
 might have been Heikki) a while back, where it was suggested that Flex
 and Bison could be run through web services. This might actually make

Might've been me - I've been doing that for a long time to work around
winflex issues. But I never got around to doing anything like access
control or so, I just ran it on a hidden ip on a random port, and
simple curl call on the windows box..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-02 Thread Bruce Momjian
On Wed, May 02, 2012 at 10:37:58AM -0700, Jeff Janes wrote:
 I could try to clean up and post the patch that implements this if you want.
 
 The second method was just to do --enable-profiling on a stock build
 and look at the call graph section of gprof output.  It attributed 50%
 to pg_plan_queries and children and about 10% to each of
 pg_parse_query and pg_analyze_and_rewrite (including their respective
 children).  I don't put tremendous faith in gprof's call graph, but
 the fact that the results are in agreement with the other method gives
 me more confidence in both.

Those are the ratio's I (and I think Tom) expected to see.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Have we out-grown Flex?

2012-05-01 Thread Peter Geoghegan
Quite apart from the practical difficulties that we have with Flex
(the fact that the authors are non-responsive and possibly retired,
that annoying compiler warning, and the fact that we are forced to
maintain our own Windows binaries of 2.5.35), it has some notable
disadvantages. I am aware that the MySQL people use their own
hand-coded lexical analyzer named sql_lex.cc, which provides a yacc
interface, while avoiding using any lexical analyzer generator
whatsoever. They can't have done this just for fun, and no doubt this
goes some way to explaining their continued performance advantage for
very simple queries. I have heard people complain about Postgres
parser overhead for pgbench -S style use-cases where it is
unreasonably high, and I've heard them do so more than once.

I suspect that we pay a high price for our table-based lexical
analyser's use of a table data structure to maintain transition
information, and the MySQL guys do not. It seems exceedingly likely
that our lexer is way more sophisticated than theirs, which likely
makes it infeasible to emulate their approach, and certainly makes it
unattractive. However, I think there might be a better way. There is
an LGPL-licensed project named Quex (hey, GNU bison is GPL), which can
generate C (and C++) code, and I am in contact with its primary
author, Frank-Rene Schäfer.

http://quex.sourceforge.net/

While it fits the bill as a replacement for Flex if ever we were
compelled to drop Flex, it also has a number of interesting advantages
over it that warrant further investigation. As Wikipedia puts it:

Quex uses traditional steps of Thompson construction to create
non-deterministic finite automatons from regular expressions,
conversion to a deterministic finite automaton and then Hopcroft
optimization to reduce the number of states to a minimum. Those
mechanisms, though, have been adapted to deal with character sets
rather than single characters. By means of this the calculation time
can be significantly reduced. Since the Unicode character set consists
of many more code points than plain ASCII, those optimizations are
necessary in order to produce lexical analysers in a reasonable amount
of time.

Apparently, in practical terms, reductions of more than 1/2 and even
as high as 2/3 in total lexing time have been observed when switching
from Flex, and Quex uses the syntax of Flex for its description of
regular expressions, which makes a conversion research project seem
like a worthwhile undertaking. I am told that this will probably take
less than a month, and perhaps as little as two weeks given the broad
compatibility of Flex and Quex. This undertaking has the enthusiastic
support of Frank-Rene, which is in refreshing contrast to the complete
inaccessibility of the Flex developers and the dead silence on the
flex users' mailing list.

I'm certainly not suggesting that this isn't something that, in order
to be adopted, will have to be carefully considered from many angles,
of which the performance of the resulting lexer is only one, but at
the same time I dare say that this approach is the low-hanging fruit
of optimising for the use-case where parser overhead is unreasonably
high. Am I barking up the wrong tree? At the very least, hopefully
these simple observations will provoke thought - Flex/Lex is not the
only tool for generating yacc-compatible lexical analysers, and it may
not be the best one for our current needs.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-01 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 ... I have heard people complain about Postgres
 parser overhead for pgbench -S style use-cases where it is
 unreasonably high, and I've heard them do so more than once.

I was under the impression that those cycles were mostly in bison not
flex...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-01 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Quite apart from the practical difficulties that we have with Flex
 (the fact that the authors are non-responsive and possibly retired,

btw, as far as that goes, a look into their sourceforge SCM shows
evidence of continued activity.  I dunno why it's been so long since
their last formal release, but they're not dead.

 that annoying compiler warning,

http://flex.cvs.sourceforge.net/viewvc/flex/flex/flex.skl?r1=2.213r2=2.214

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-01 Thread Robert Haas
On Tue, May 1, 2012 at 8:53 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 I'm certainly not suggesting that this isn't something that, in order
 to be adopted, will have to be carefully considered from many angles,
 of which the performance of the resulting lexer is only one, but at
 the same time I dare say that this approach is the low-hanging fruit
 of optimising for the use-case where parser overhead is unreasonably
 high. Am I barking up the wrong tree? At the very least, hopefully
 these simple observations will provoke thought - Flex/Lex is not the
 only tool for generating yacc-compatible lexical analysers, and it may
 not be the best one for our current needs.

I think Tom's question of whether the parser or lexer is the problem
is something that ought to be investigated.  Personally, I suspect
that our tendency to use lists everywhere, for everything, an artifact
of our proud LISP heritage, may be costing us dearly in terms of parse
time.  However, there's a difference between suspicion and proof, and
I certainly have none of the latter.

One fairly major obstacle to using quex for anything is that it
doesn't appear to be packaged on either Fedora or Ubuntu (I tried
apt-cache search quex on my Ubuntu laptop, and yum install quex on an
F16 machine).  'port install quex' on my MacOS X doesn't find it
either.  This means that if we were to switch to quex, the barrier to
compiling PostgreSQL would go up very significantly.  I realize
there's a chicken-and-egg problem there, since if quex is awesome and
nobody uses it because it isn't packaged, then it will never become
popular enough for anyone to consider packaging it.  Nonetheless, I'm
not sure I want to be the guinea pig.  Even if the actual
patch-as-committed-to-git to make the switch to quex were not that
large, the amount of follow-on work we'd be creating for every
developer and packager of PostgreSQL would be quite formidable.  We
shouldn't do that unless it's pretty much awesome.

One possible way forward would be to try to be compatible with both
quex and flex.  We'd need a buildfarm critter or two running with quex
to detect breakage there, though, since a lot of people would probably
continue to run flex. But I'm not entirely sure it's worth doing even
that much unless we can show a worthwhile performance bump.  The core
scanner doesn't change much, so more features isn't too compelling
as a reason for change, especially if we need to remain
flex-compatible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 One fairly major obstacle to using quex for anything is that it
 doesn't appear to be packaged on either Fedora or Ubuntu (I tried
 apt-cache search quex on my Ubuntu laptop, and yum install quex on an
 F16 machine).  'port install quex' on my MacOS X doesn't find it
 either.  This means that if we were to switch to quex, the barrier to
 compiling PostgreSQL would go up very significantly.  I realize
 there's a chicken-and-egg problem there, since if quex is awesome and
 nobody uses it because it isn't packaged, then it will never become
 popular enough for anyone to consider packaging it.  Nonetheless, I'm
 not sure I want to be the guinea pig.

I was trying to think of a delicate way to put that, but since you
beat me to it ... yeah, exactly.  It's possible that quex is so
spectacular that we'd be willing to absorb the conversion costs
and the penalties of being a pioneer user of a new tool, but the bar
seems pretty high.

 Even if the actual
 patch-as-committed-to-git to make the switch to quex were not that
 large, the amount of follow-on work we'd be creating for every
 developer and packager of PostgreSQL would be quite formidable.

FWIW, I think only developers not packagers would really be taking such
a hit.  I assume we'd continue to ship prebuilt lexer output in
tarballs, so there'd seldom be a reason for a packager to need to run
the tool.  Given the extremely slow rate of churn of the lexer, it might
not be necessary for most developers to have the tool installed, either,
if we were willing to put the derived file into git.

Having said all that, if the amount of effort needed to make a trial
port is within reason, by all means let's try it and see what the
performance difference is, rather than just speculating.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers