-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25/04/2010 03:02, Tom Lane wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Sat, Apr 24, 2010 at 8:07 PM, Bruce Momjian <br...@momjian.us> wrote:
>>> Sounds useful to me, though as a function like suggested in a later
>>> email.
> 
>> If tool-builders think this is useful, I have no problem with making
>> it available.  It should be suitably disclaimed: "We reserve the right
>> to rip out the entire flex/yacc-based lexer and parser at any time and
>> replace them with a hand-coded system written in Prolog that emits
>> tokenization information only in ASN.1-encoded pig latin.  If massive
>> changes in the way this function works - or its complete disappearance
>> - are going to make you grumpy, don't call it."
> 
> I'm a bit concerned with the vagueness of the goals here.  We started
> with a request to dump out node trees, ie, post-parsing representation;
> but the example use case of syntax highlighting would find that
> representation quite useless.  (Example: foo::bar and CAST(foo AS bar)
> yield the same parse tree.)  

Well, the tokenizer stuff was actually my understanding of the following
quote from Michael Tharp :
« ... making the internal SQL parser available to clients via a
C-language SQL function. ».

I thought Michael was trying to write a tokenizer based on node tree
returned by raw_parser. As it seems Michael is not even sure about what
he's trying to do, I prefer refocus a bit this thread

> A syntax highlighter might get some use
> out of the lexer-output token stream, but I'm afraid from the proposed
> output that people might be expecting more semantic information than
> the lexer can provide.  The lexer doesn't, for example, have any clue
> that some keywords are commands and others aren't; nor any very clear
> understanding about the semantic difference between the tokens '='
> and ';'.

Exact, a proper tokenizer function should be able to give some (simple)
information about the type of each token. That is what I tried to define
in this draft with the "type" field :

  => SELECT pgtokenize($script$
      SELECT 1;
      UPDATE test SET "a"=2;
    $script$);

     type      | pos |   value  | line
  -------------+-----+----------+------
   SQL_COMMAND | 1   | 'SELECT' |   1
   CONSTANT    | 8   | '1'      |   1
   DELIMITER   | 9   | ';'      |   1
   SQL_COMMAND | 11  | 'UPDATE' |   2
   IDENTIFIER  | 18  | 'test'   |   2
   SQL_KEYWORD | 23  | 'SET'    |   2
   IDENTIFIER  | 27  | '"a"'    |   2
   OPERATOR    | 30  | '='      |   2
   CONSTANT    | 31  | '1'      |   2


> 
> Also, if all you want is the lexer, it's not that hard to steal psql's
> version and adapt it to your purposes.  The lexer doesn't change very
> fast, and it's not that big either.

Stealing the lexer from psql is possible...for C application.
Don't know yet if we could port it to other languages easily and if a
simple lexer would really answer the use cases here.

> 
> Anyway, it certainly wouldn't be hard for an add-on module to provide a
> SRF that calls the lexer (or parser) and returns some sort of tabular
> representation of the results.  I'm just not sure how useful it'll be
> in the real world.

Well, I would prefer not to tell users of pgAdmin or phpPgAdmin that
they depend on a contrib module.
Moreover, PostgreSQL already expose a lot of informations about its
internal mechanisms, configuration, ddl etc. I think having a proper
tokenizer function is just a natural new functionality for core if possible.

Having dropped an eye here and there in the parser code, I am not sure
where I could get required info and mix them to produce something close
to my draft yet.
But I prefer to discussing first before spending too much time and
throwing any potential code after...

> 
>                       regards, tom lane

- -- 
JGuillaume (ioguix) de Rorthais
http://www.dalibo.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvXdxgACgkQxWGfaAgowiJujQCglXpCYpFttwHOkmkCd92zMxnv
r00An1sjmRrR6u61VjCtXputcNBevHsz
=ri3i
-----END PGP SIGNATURE-----

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to