Re: [Firebird-net-provider] Parser class/library
Jiri, To get more concrete can you provide examples of all relevant corner cases? -Ursprüngliche Nachricht- Von: Jiří Činčura [mailto:j...@cincura.net] Gesendet: Dienstag, 6. Oktober 2015 06:37 An: For users and developers of the Firebird .NET providers <firebird-net-provider@lists.sourceforge.net> Betreff: Re: [Firebird-net-provider] Parser class/library On Mon, Oct 5, 2015, at 21:32, Michał Ziemski wrote: > IMHO you won't be able to handle all the corner cases without full > grammar parser. Well, the cases we need are not that wide. But it's tedious anyway. > For example consider a "SET TERM" inside an "EXECUTE BLOCK". > Lacking the grammar understnding you won't recognize that as an > invalid term in the execute block and you'll simply cut the block in > half. So the decision is how far do you I don't care much about invalid scripts. It fails either way. But I agree that it's at least confusing for people consuming the library. > want > to go. Personally, not far. :) There's way more interesting pieces. > Writing a full scale parser for FB SQL in a rather easy but very > tedious and time-consuming task. > Having that as a tool would be a great addition to FB ecosystem. > The parser in FB itself is written in yacc so it's faily transportable. > Sill you'll have to go rule by rule > and convert to C#. > Actually I have tried this myself in F# (it's far far better suited > for > parsers) and am about 50% through. > I would gladly donate the code if you'd be interested. I don't think it will hurt. > If you would preffer the faster approach I would suggest: > - a simple lexer by hand that recognizes tokens "SET" "TERM" > CURRENT_TERM_SYMBOL "--" , NEWLINE OTHER_TOKEN and STRING > - a parser that iterates the tokens form lexer, tests for sequences: > "SET" "TERM" OTHER_TOKEN CURRENT_TERM_SYMBOL - to set new terms > CURRENT_TERM_SYMBOL - to yield accumulated OTHER_TOKENs and STRINGs > - -- to start a comment and skip everything till NEWLINE this should > be an easy enough state machine to write by hand. That's what I'm currently doing. Sadly. -- Mgr. Jiří Činčura Independent IT Specialist -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
On Tue, Oct 6, 2015, at 13:16, Amro wrote: > To get more concrete can you provide examples of all relevant corner > cases? If I would knew all of them it would not be corner case. It's just a lot of playing. -- Mgr. Jiří Činčura Independent IT Specialist -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
Talking about corner cases. Would anybody here mind sending me some statements that mix comments, literals, SET TERMs, etc. in some crazy ways? I think the new parser - which handles comments and is also faster - is getting ready to real tests. :) -- Mgr. Jiří Činčura Independent IT Specialist -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
I know we are sponsoring This... But it sounds to me like so much work and so little gain. Sure, the current parse thing is slow and buggy, but only a few minor bugs and performance gain Will be limited as well. I think looking at 3th party parsers isn,'t a bad idea at All. Sent from my Windows Phone From: Jiří Činčura<mailto:j...@cincura.net> Sent: 6/10/2015 6:38 To: For users and developers of the Firebird .NET providers<mailto:firebird-net-provider@lists.sourceforge.net> Subject: Re: [Firebird-net-provider] Parser class/library On Mon, Oct 5, 2015, at 21:32, Michał Ziemski wrote: > IMHO you won't be able to handle all the corner cases without full > grammar > parser. Well, the cases we need are not that wide. But it's tedious anyway. > For example consider a "SET TERM" inside an "EXECUTE BLOCK". > Lacking the grammar understnding you won't recognize that as an invalid > term in the execute block and > you'll simply cut the block in half. So the decision is how far do you I don't care much about invalid scripts. It fails either way. But I agree that it's at least confusing for people consuming the library. > want > to go. Personally, not far. :) There's way more interesting pieces. > Writing a full scale parser for FB SQL in a rather easy but very tedious > and time-consuming task. > Having that as a tool would be a great addition to FB ecosystem. > The parser in FB itself is written in yacc so it's faily transportable. > Sill you'll have to go rule by rule > and convert to C#. > Actually I have tried this myself in F# (it's far far better suited for > parsers) and am about 50% through. > I would gladly donate the code if you'd be interested. I don't think it will hurt. > If you would preffer the faster approach I would suggest: > - a simple lexer by hand that recognizes tokens "SET" "TERM" > CURRENT_TERM_SYMBOL "--" , NEWLINE OTHER_TOKEN and STRING > - a parser that iterates the tokens form lexer, tests for sequences: > "SET" "TERM" OTHER_TOKEN CURRENT_TERM_SYMBOL - to set new terms > CURRENT_TERM_SYMBOL - to yield accumulated OTHER_TOKENs and STRINGs > - -- to start a comment and skip everything till NEWLINE > this should be an easy enough state machine to write by hand. That's what I'm currently doing. Sadly. -- Mgr. Jiří Činčura Independent IT Specialist -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
On Mon, Oct 5, 2015, at 21:32, Michał Ziemski wrote: > IMHO you won't be able to handle all the corner cases without full > grammar > parser. Well, the cases we need are not that wide. But it's tedious anyway. > For example consider a "SET TERM" inside an "EXECUTE BLOCK". > Lacking the grammar understnding you won't recognize that as an invalid > term in the execute block and > you'll simply cut the block in half. So the decision is how far do you I don't care much about invalid scripts. It fails either way. But I agree that it's at least confusing for people consuming the library. > want > to go. Personally, not far. :) There's way more interesting pieces. > Writing a full scale parser for FB SQL in a rather easy but very tedious > and time-consuming task. > Having that as a tool would be a great addition to FB ecosystem. > The parser in FB itself is written in yacc so it's faily transportable. > Sill you'll have to go rule by rule > and convert to C#. > Actually I have tried this myself in F# (it's far far better suited for > parsers) and am about 50% through. > I would gladly donate the code if you'd be interested. I don't think it will hurt. > If you would preffer the faster approach I would suggest: > - a simple lexer by hand that recognizes tokens "SET" "TERM" > CURRENT_TERM_SYMBOL "--" , NEWLINE OTHER_TOKEN and STRING > - a parser that iterates the tokens form lexer, tests for sequences: > "SET" "TERM" OTHER_TOKEN CURRENT_TERM_SYMBOL - to set new terms > CURRENT_TERM_SYMBOL - to yield accumulated OTHER_TOKENs and STRINGs > - -- to start a comment and skip everything till NEWLINE > this should be an easy enough state machine to write by hand. That's what I'm currently doing. Sadly. -- Mgr. Jiří Činčura Independent IT Specialist -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
Hi! IMHO you won't be able to handle all the corner cases without full grammar parser. For example consider a "SET TERM" inside an "EXECUTE BLOCK". Lacking the grammar understnding you won't recognize that as an invalid term in the execute block and you'll simply cut the block in half. So the decision is how far do you want to go. Writing a full scale parser for FB SQL in a rather easy but very tedious and time-consuming task. Having that as a tool would be a great addition to FB ecosystem. The parser in FB itself is written in yacc so it's faily transportable. Sill you'll have to go rule by rule and convert to C#. Actually I have tried this myself in F# (it's far far better suited for parsers) and am about 50% through. I would gladly donate the code if you'd be interested. If you would preffer the faster approach I would suggest: - a simple lexer by hand that recognizes tokens "SET" "TERM" CURRENT_TERM_SYMBOL "--" , NEWLINE OTHER_TOKEN and STRING - a parser that iterates the tokens form lexer, tests for sequences: "SET" "TERM" OTHER_TOKEN CURRENT_TERM_SYMBOL - to set new terms CURRENT_TERM_SYMBOL - to yield accumulated OTHER_TOKENs and STRINGs - -- to start a comment and skip everything till NEWLINE this should be an easy enough state machine to write by hand. Cheers! Michał 2015-10-05 15:30 GMT+02:00 Jiří Činčura: > Hi, > > I'm working on a bug fix for DNET-266. And the more and more I tweak the > parser I wrote this morning to properly handle all the edge cases I'm > wondering whether it would make sense to take a dependency on some > library or class that can do some basic parsing. We don't need full > grammar features like ANTLR, just something that can tokenize SQL and > handle comments (or in general tokenizer with "escaping" support). > > What do you think? Any recommendations? > > BTW the bugfix is sponsored by SMS-Timing. Kudos to them. > > -- > Mgr. Jiří Činčura > Independent IT Specialist > > > -- > ___ > Firebird-net-provider mailing list > Firebird-net-provider@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider > -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
I would be very careful taking a dependency on an external library like yacc or antlr and would recommend to avoid such a path. If at all a standalone "light weight" tokenizer should be preferable although, unfortunately, I don’t know any. Libraries like antlr and yacc are simply awesome but changes in such libraries tends to be breaking changes! > -Ursprüngliche Nachricht- > Von: Геннадий Забула [mailto:zabulu...@gmail.com] > Gesendet: Montag, 5. Oktober 2015 16:22 > An: For users and developers of the Firebird .NET providers provi...@lists.sourceforge.net> > Betreff: Re: [Firebird-net-provider] Parser class/library > > EF uses yacc and lex for EntityTree parsing. > They both produce C# class that can be compiled into the assembly. > > On 5 October 2015 at 16:30, Jiří Činčura <j...@cincura.net> wrote: > > Hi, > > > > I'm working on a bug fix for DNET-266. And the more and more I tweak > > the parser I wrote this morning to properly handle all the edge cases > > I'm wondering whether it would make sense to take a dependency on > some > > library or class that can do some basic parsing. We don't need full > > grammar features like ANTLR, just something that can tokenize SQL and > > handle comments (or in general tokenizer with "escaping" support). > > > > What do you think? Any recommendations? > > > > BTW the bugfix is sponsored by SMS-Timing. Kudos to them. > > > > -- > > Mgr. Jiří Činčura > > Independent IT Specialist > > > > -- > > ___ > > Firebird-net-provider mailing list > > Firebird-net-provider@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider > > -- > ___ > Firebird-net-provider mailing list > Firebird-net-provider@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
http://entityframework.codeplex.com/SourceControl/latest#src/EntityFramework/Core/Common/EntitySql/GenerateParser.cmd On 5 October 2015 at 17:22, Геннадий Забулаwrote: > EF uses yacc and lex for EntityTree parsing. > They both produce C# class that can be compiled into the assembly. > > On 5 October 2015 at 16:30, Jiří Činčura wrote: >> Hi, >> >> I'm working on a bug fix for DNET-266. And the more and more I tweak the >> parser I wrote this morning to properly handle all the edge cases I'm >> wondering whether it would make sense to take a dependency on some >> library or class that can do some basic parsing. We don't need full >> grammar features like ANTLR, just something that can tokenize SQL and >> handle comments (or in general tokenizer with "escaping" support). >> >> What do you think? Any recommendations? >> >> BTW the bugfix is sponsored by SMS-Timing. Kudos to them. >> >> -- >> Mgr. Jiří Činčura >> Independent IT Specialist >> >> -- >> ___ >> Firebird-net-provider mailing list >> Firebird-net-provider@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/firebird-net-provider -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
EF uses yacc and lex for EntityTree parsing. They both produce C# class that can be compiled into the assembly. On 5 October 2015 at 16:30, Jiří Činčurawrote: > Hi, > > I'm working on a bug fix for DNET-266. And the more and more I tweak the > parser I wrote this morning to properly handle all the edge cases I'm > wondering whether it would make sense to take a dependency on some > library or class that can do some basic parsing. We don't need full > grammar features like ANTLR, just something that can tokenize SQL and > handle comments (or in general tokenizer with "escaping" support). > > What do you think? Any recommendations? > > BTW the bugfix is sponsored by SMS-Timing. Kudos to them. > > -- > Mgr. Jiří Činčura > Independent IT Specialist > > -- > ___ > Firebird-net-provider mailing list > Firebird-net-provider@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Parser class/library
True, but in order for yacc (or antlr) to generate C# code you have to use their syntax. This syntax is used to provide the so called grammar mentioned by Jiri. In General those kind of tools are basically small "compilers" which parses a given piece of "grammar" and then construct lexical analyzers. Changes in the syntax of the "grammar" tends to be breaking changes by nature! > -Ursprüngliche Nachricht- > Von: Геннадий Забула [mailto:zabulu...@gmail.com] > Gesendet: Montag, 5. Oktober 2015 16:43 > An: For users and developers of the Firebird .NET providers provi...@lists.sourceforge.net> > Betreff: Re: [Firebird-net-provider] Parser class/library > > IIRC, yacc\lex are an external tools that produce .cs files that provide > parsing. > > > On 5 October 2015 at 17:31, Amro El-Fakharany <amro00...@gmail.com> > wrote: > > I would be very careful taking a dependency on an external library like yacc > or antlr and would recommend to avoid such a path. > > If at all a standalone "light weight" tokenizer should be preferable > although, unfortunately, I don’t know any. > > Libraries like antlr and yacc are simply awesome but changes in such > libraries tends to be breaking changes! > > > > > >> -Ursprüngliche Nachricht- > >> Von: Геннадий Забула [mailto:zabulu...@gmail.com] > >> Gesendet: Montag, 5. Oktober 2015 16:22 > >> An: For users and developers of the Firebird .NET providers > >> > >> Betreff: Re: [Firebird-net-provider] Parser class/library > >> > >> EF uses yacc and lex for EntityTree parsing. > >> They both produce C# class that can be compiled into the assembly. > >> > >> On 5 October 2015 at 16:30, Jiří Činčura <j...@cincura.net> wrote: > >> > Hi, > >> > > >> > I'm working on a bug fix for DNET-266. And the more and more I > >> > tweak the parser I wrote this morning to properly handle all the > >> > edge cases I'm wondering whether it would make sense to take a > >> > dependency on > >> some > >> > library or class that can do some basic parsing. We don't need full > >> > grammar features like ANTLR, just something that can tokenize SQL > >> > and handle comments (or in general tokenizer with "escaping" support). > >> > > >> > What do you think? Any recommendations? > >> > > >> > BTW the bugfix is sponsored by SMS-Timing. Kudos to them. > >> > > >> > -- > >> > Mgr. Jiří Činčura > >> > Independent IT Specialist > >> > > >> > --- > >> > --- > >> > ___ > >> > Firebird-net-provider mailing list > >> > Firebird-net-provider@lists.sourceforge.net > >> > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider > >> > >> - > >> - ___ > >> Firebird-net-provider mailing list > >> Firebird-net-provider@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/firebird-net-provider > > > > > > -- > > ___ > > Firebird-net-provider mailing list > > Firebird-net-provider@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider > > -- > ___ > Firebird-net-provider mailing list > Firebird-net-provider@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/firebird-net-provider -- ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider