[il-antlr-interest: 35006] Re: [antlr-interest] Eliminate characters in TOKEN

2011-11-23 Thread Ruslan Zasukhin
On 11/23/11 11:59 AM, "Bart Kiers"  wrote:

> Hi Rampon,
> 
> 
> On Wed, Nov 23, 2011 at 10:54 AM, Rampon Jerome wrote:
> 
>> ...
>> it complained on output option to be AST.
>> If I add it in my grammar options if complains and still return error
>> It seems it automatically adds if not there but later on still return
>> error ???
>> 
>> Is that normal ?
>> 
> 
> Yes, the `!` to exclude characters from lexer rules (as was possible in v2)
> is no longer valid in v3 grammars.

Yes, I also was in face to this change in v3.
This is examples from our Valentina SQL grammar where we use new trick to
avoid e.g. Wrapper quotes


//--

// String literals:

// caseSensitive = false, so we use only small chars.
fragment
Letter
:'a'..'z'
|   '@'
;


fragment
EscapeSequence
:'\\' ( QUOTE|'\\'|'b'|'t'|'n'|'f'|'r' )
;


STRING_LITERAL
@init
{
int escape_count = 0;
int theStart = $start;
}
:QUOTE

{ theStart = GETCHARINDEX(); } // skip first quote

(EscapeSequence{ ++escape_count; }
|QUOTE QUOTE   { ++escape_count; }
|~( QUOTE | '\\' )
)* 

{ 
$start = theStart;
EMIT();

// Optimization: lexer have found escaped chars, and we even
count them.
// We pass this info into parser/tree parser inside of
token,
// so later algorithms can avoid one more scan of literal to
check if 
// exists any symbols to unescape. Also knowing how much
such symbols
// Alg can do immediate return when all known escapes
resolved ...
// Also this can help accurately calculate RAM for unescaped
string.
//
LTOKEN->user1 = escape_count;
}

QUOTE // and skip last quote
;





//---
IDENT
:( Letter | '_' ) ( Letter | '_' | Digit )*
;


DELIMITED// delimited_identifier
@init
{
$type = IDENT;
int theStart = $start;
}
:
(DQUOTE{ theStart = GETCHARINDEX(); }
( ~(DQUOTE) | DQUOTE DQUOTE )+
{ $start = theStart; EMIT(); }
DQUOTE

|BQUOTE{ theStart = GETCHARINDEX(); }
( ~(BQUOTE) | BQUOTE BQUOTE )+
{ $start = theStart; EMIT(); }
BQUOTE

// valentina/oracle extension: [asasas '' " sd "]
|LBRACK{ theStart = GETCHARINDEX(); }
( ~(']') )+
{ $start = theStart; EMIT(); }
RBRACK
)
;




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34991] Re: [antlr-interest] Re : reuse() methos in 3.4 C runtime/exception report

2011-11-21 Thread Ruslan Zasukhin
On 11/21/11 7:30 PM, "Jim Idle"  wrote:

> I believe that is fixed in the latest source.

What exactly is fixed ?
This was answer to Jeromi?


As I have told, I was need self  copy your reset() method,
Into few others factories, 4-5 major fixes in sources of 3.4

I will be happy of course send to you corrected sources.
Or you mean that you self already have found and fix that 4-5 places?


=
Thanks to A Z 

Which have sent me corrected sources of ANTLR3/C,
Where I have show as he ujst removed in CommonToken few func pointers, and
replaced func calls to direct access to few members of this structure.

Well, this trick have give yet 0.5 second ...
Not many but still good to know exists such way ...

I have add these changes under define, so I can easy enable/disable them ..



I am spending now time in profiles ,,,already few days trying to learn
things as deeply as possible ...



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34980] Re: [antlr-interest] [C] reuse() - FIXED ... :-)

2011-11-19 Thread Ruslan Zasukhin
On 11/19/11 12:36 AM, "Ruslan Zasukhin" 
wrote:

Hi Jim,
Hi All,

Now I am happy :

Everything below from my prev letter is right.
I have fix this couple places, and have discover yet couple required to be
improved for correct reuse():
   adaptor  -- factory of nillObjects
   factory of vectors

Now RAM do not grows absolutely and speed is
v3 no reuse()   24 sec
v3 with reused19.4 sec
v3 with reuse  19.7

And I believe I see yet few moments to improve for better speed.
I will try in nearest days ... Then inform here.

For example, it seems to me, we can perfectly reset also nodes from
treeparser. It have also pool. Everything is very similar ... So why to kill
pool and create it each time ...


Of course I have not get my dream x2 speedup using v3 :)
But at least not worse ...


> So I have debug debug and have found that
> 
> =
> 1) Generated Parser contains ctx->adaptor, which contains one more
> tokenFactory. And for this factory NEVER is called reset().
> 
> I have add call reset() for now in the method of generated parser.
> This have made things better, but still I see at least two players...
> 
> So next player is:
> 
> =
> 2) This adaptor also has more deeply hidden factory of trees.
> And it is called a lots for nilNode().
> 
> And this factory 
> typedefstruct ANTLR3_ARBORETUM_struct
> 
> Although is very similar to
> typedefstruct ANTLR3_TOKEN_FACTORY_struct
> 
> In work with pools, it do NOT have reset() function.
> 
> So I think I will try add reset() method to this struct,
> And call it from generated parser reset() method as following
> 
> 
> static void
> SqlParser_v3ParserReset (pSqlParser_v3Parser ctx)
> {
> RECOGNIZER->reset(RECOGNIZER);
> 
> // RZ added this to see if this fixes grow of RAM.
> 
> ADAPTOR->tokenFactory->reset( ADAPTOR->tokenFactory );
> 
> ((pANTLR3_COMMON_TREE_ADAPTOR)(ADAPTOR->super))->arboretum->reset(
> (ADAPTOR->super))->arboretum);
> }
> 
> 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34978] [antlr-interest] [C] reuse() - grow of RAM -- attempt #2.

2011-11-18 Thread Ruslan Zasukhin
Hi Jim,

So I have debug debug and have found that

=
1) Generated Parser contains ctx->adaptor, which contains one more
tokenFactory. And for this factory NEVER is called reset().

I have add call reset() for now in the method of generated parser.
This have made things better, but still I see at least two players...

So next player is:

=
2) This adaptor also has more deeply hidden factory of trees.
And it is called a lots for nilNode().

And this factory 
typedefstruct ANTLR3_ARBORETUM_struct

Although is very similar to
typedefstruct ANTLR3_TOKEN_FACTORY_struct

In work with pools, it do NOT have reset() function.

So I think I will try add reset() method to this struct,
And call it from generated parser reset() method as following


static void
SqlParser_v3ParserReset (pSqlParser_v3Parser ctx)
{
RECOGNIZER->reset(RECOGNIZER);

// RZ added this to see if this fixes grow of RAM.

ADAPTOR->tokenFactory->reset( ADAPTOR->tokenFactory );

((pANTLR3_COMMON_TREE_ADAPTOR)(ADAPTOR->super))->arboretum->reset(
(ADAPTOR->super))->arboretum);
}



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34977] Re: [antlr-interest] [C] reuse() bug here? -- IGNORE it ... Not here :)

2011-11-18 Thread Ruslan Zasukhin
On 11/18/11 5:12 PM, "Ruslan Zasukhin" 
wrote:


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34968] [antlr-interest] [C] reuse() bug here?

2011-11-18 Thread Ruslan Zasukhin
Hi Jim,

I think bug is here ... Just only have found this place during debug,
And need run away for few hours.


static pANTLR3_COMMON_TOKEN
newPoolToken(pANTLR3_TOKEN_FACTORY factory)
{
pANTLR3_COMMON_TOKEN token;

/* See if we need a new token pool before allocating a new
 * one
 */
if (factory->nextToken >= ANTLR3_FACTORY_POOL_SIZE)
<<<<<<<<<<<<<< 1 !! If we have FEW pools then this is wrong.
{
/* We ran out of tokens in the current pool, so we need a new pool
 */
newPool(factory);
}

/* Assuming everything went well (we are trying for performance here so
doing minimal
 * error checking. Then we can work out what the pointer is to the next
token.
 */

token = factory->pools[factory->thisPool] + factory->nextToken;
   ^^
  // RZ: nextToken was 1024, we have
allocate above new pool
  // and we should use its ZERO item

factory->nextToken++; // and we get 1025 ...


It seems to me ... We must nextToken counter drop to zero when we allocate
next pool.

And may be correct other places ...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34959] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-17 Thread Ruslan Zasukhin
On 11/18/11 1:24 AM, "Jim Idle"  wrote:

Hi Jim,

> You should not be seeing more than a few newPool calls, however, if you
> are building a tree then this may be affecting it.

You mean 

A) my own tree in the parser ?
  no, I do not build. Work ANTLR itself to build AST

B) tree parser?
  but how this affect?
  and everybody need tree parser ...


> The reuse stuff was not built for trees,

Right.  This is why in my Reuse() func you can see that I destroy tree
parser, then it is created again later ...

**
void SqlParser_v3::ResuseParserObjects(
const char*inTextToParse,
vuint32inLength )
{
// ---
// TREE PARSER cannot be reused. Destroy it.
//
if( mpTreeParser )
{
mpTreeParser->free( mpTreeParser );
mpTreeParser = NULL;
}

if( mpNodes )
{
mpNodes->free( mpNodes );
mpNodes = NULL;
}


// ---
// Reuse other objects
//
mpInput->reuse(
mpInput, 
(pANTLR3_UINT8) inTextToParse,
(ANTLR3_UINT32) inLength,
(pANTLR3_UINT8) "VSQL" );

mpTokenStream->reset( mpTokenStream );
mpLexer ->reset( mpLexer );
mpParser ->reset( mpParser );

ResetOwnData( mpParser );
}


> so you may have to debug this because I won't have time
> to look at new use cases for some time.

Yes, I am going now spend time to see how work  parser->reset() and others
here.

But can you at least give me points, what I should see ?
You set some flags for objects of parser?
 
> I will take out the myriad duplication of function pointers over the new
> year all being well.

This is speed ok. But memory ...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34956] Re: [antlr-interest] reuse() methos in 3.4 C runtime

2011-11-17 Thread Ruslan Zasukhin
Hi Jim,

Below are copy-pastes of my class-wrapper around ANTLR3
Lexer/Parser/TreeParser.
So you can see if I made some stupid mistake...


>> On 6/24/11 7:49 PM, "Jim Idle"  wrote:
>>
>> Because the documentation is not yet up to date, here is an example of
>> reusing the allocated memory in input streams and token streams:
>> 
>> for (i=0; i>{
>> // Run the parser.
>> psr->start(psr);
>> 
>> // --
>> // Now reset everything for the next run.
>> // Order of calls is important.
>> 
>> // Input stream can now be reused
>> input->reuse(input, sourceCode, sourceLen, sourceName);
>> 
>> // Reset the common token stream so that it will reuse its resources
>> tstream->reset(tstream);
>> 
>> // Reset the lexer (new function generated by antlr now)
>> lxr->reset(lxr);
>> 
>> // Reset the parser (new function generated by antlr now)
>> psr->reset(psr);
>> }


/**
void SqlParser_v3::ResuseParserObjects(
const char*inTextToParse,
vuint32inLength )
{
// ---
// TREE PARSER cannot be reused. Destroy it.
//
if( mpTreeParser )
{
mpTreeParser->free( mpTreeParser );
mpTreeParser = NULL;
}

if( mpNodes )
{
mpNodes->free( mpNodes );
mpNodes = NULL;
}


// ---
// Reuse other objects
//
mpInput->reuse(
mpInput, 
(pANTLR3_UINT8) inTextToParse,
(ANTLR3_UINT32) inLength,
(pANTLR3_UINT8) "VSQL" );

mpTokenStream->reset( mpTokenStream );
mpLexer ->reset( mpLexer );
mpParser ->reset( mpParser );

ResetOwnData( mpParser );
}



And few other related methods ...


/**
void SqlParser_v3::Parse_UTF8(
I_SqlDatabaseEx*   inDatabase,
const char*inCommand,
const char*inCommandEnd )
{
argused1(inDatabase);

//COMMENT this line to force  REUSE() mode ...
//DestroyParserObjects();

if( mpInput ) 
ResuseParserObjects( inCommand, (inCommandEnd - inCommand) );
else
CreateParserObjects( inCommand, (inCommandEnd - inCommand) );


// -
// Parse the input expression
mAST = mpParser->sql( mpParser );


// IF PARSER have generate some errors,
// then we throw them as VSQL exception.
if( mpParser->pParser->rec->state->errorCount )
{
StToUTF16 cnv( ResultStringBuffer, pErrEnd, GetConverter_UTF8() );
throw VSQL::xVSQLException( ERR_SQL_PARSER_ERROR, cnv.c_str() );
}
}


/**
void SqlParser_v3::CreateParserObjects(
const char*inTextToParse,
vuint32inLength )
{
if( inTextToParse == NULL )
return; // all objects will be still NULLs also.

// --
// Create INPUT object:

// NOTE: SQL strings do not have BOM - first few bytes, which define
endian of UTF16.
// So for UTF16, we must here self specify BE or LE.
mpInput = antlr3StringStreamNew(
(pANTLR3_UINT8) inTextToParse, mEncoding, (ANTLR3_UINT32) inLength,
(pANTLR3_UINT8) "VSQL" );

mpInput->setUcaseLA( mpInput, ANTLR3_TRUE );


// --
// Create LEXER v3 object:

mpLexer = SqlParser_v3LexerNew( mpInput );
mpTokenStream = antlr3CommonTokenStreamSourceNew( ANTLR3_SIZE_HINT,
TOKENSOURCE( mpLexer ) );


// --
// Create PARSER v3 object:

mpParser = SqlParser_v3ParserNew( mpTokenStream );  // is generated by
ANTLR3
mpParser->mDoAllCommands = mDoAllCommandsInitial;

ResetOwnData( mpParser );


/**
void SqlParser_v3::DestroyParserObjects( void )
{
// REVERSE ORDER to construction:

if( mpTreeParser )
{
mpTreeParser->free( mpTreeParser );
mpTreeParser = NULL;
}

if( mpNodes )
{
mpNodes->free( mpNodes );
mpNodes = NULL;
}

if( mpParser )
{
mpParser->mpStartPositions = NULL;

mpParser->free( mpParser );
mpParser = NULL;
}
   
if( mpTokenStream )
{
mpTokenStream->free( mpTokenStream );
    mpTokenStream = NULL;
}

if( mpLexer )
{
mpLexer->free( mpLexer );
mpLexer = NULL;
}

if( mp

[il-antlr-interest: 34954] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-17 Thread Ruslan Zasukhin
Hi Jim,

Thank you  for your feedback, and I have update now...

1) I was able remove all  .text  usage in both Parser and TreeParser. GOOD.

2) BAD ... This have save 500MB,
   but I still have 1.5GB of allocations in my bench ...

And now I see (using Apple Instruments) that all this is eaten by PARSER.
Not by Lexer, and not by TreeParser.

I just see endless 
newPool
newPool
newPool
newPool
newPool
newPool

I will send you snapshoot off list so you can see that.

And now there is ZERO my code, which affect this.
Only ANTLR own logic...

This makes me think, that reuse() do not work as expected.

As I understand, when we do
parser( reset )

It must mark all existed allocations as free in your pool,
So next run should reuse all that. Yes?

And note, that all my calls to parser, are very similar by size.
This is just 
INSERT INTO( f1, f,2 ... f9 ) VALUES ( v1, v2, ... )

I.e. Pool really should not grow much after first / second iteration of
loop. But it grows like crazy.


I think you have own test app where you did test this ...
May be just increase loop count to million or such
To see that RAM on your computer go away ...


I very hope you will be able find issue and show how fix it
in sources of ANTLR 3.4 ...  Please?  May be some kind of objects is not
marked as free ?


==
Also interesting fact.

v3 without reuse 22.4 sec
v3 with reuse and 1.5GB allocation 20.4 sec

v2 with reuse  19.7


So if we will be able resolve this 1.5GB "leaks", there is yet hope to be at
least not slower of v2 ...


===
About your hope that V3 C should be much faster of v2 C++
So far I do not see this.

I see in profiles,
parser   36%RAM only
tree parser24%RAM only
execute of vdb engine  13% insert recs into disk (!!) db

And when I am starting go deep by parser calls ... I just see that deep is
big  
sql -> sql_single -> 

And each step down just reduce 0.5-0.8%  ...

This is BODY of each rule of parser ...

And nothing really to optimize :(
Just a lots of small calls  ... NilNodes,  LT(), ...


===
My vision is that this is Nature of ANTLR ... We get many calls of parser
funcs ... Deep stack ... Although they are light they eat milliseconds ...


And fact that in C you need create structures with huge number of pointers
sometimes, then e.g NULL them,  in C++ virtual table of methods is created
once per class, not once per instances ... This fact can be one of hidden
bottleneck IMO. You can workaround this, if also will extract pointers into
single separate structure, so instances will have just a single pointer.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34953] Re: [antlr-interest] reuse() methos in 3.4 C runtime

2011-11-17 Thread Ruslan Zasukhin
On 6/24/11 7:49 PM, "Jim Idle"  wrote:

Hi Jim,

I will send few letters now, but I will start with simple question to this
your letter.

Am I right, that we also in this loop should kill and create again
TreeParser?  

Yes?


> Because the documentation is not yet up to date, here is an example of
> reusing the allocated memory in input streams and token streams:
> 
> 
> 
> for (i=0; i{
> // Run the parser.
> //
> psr->start(psr);
> 
> // --
> // Now reset everything for the next run.
> // Order of calls is important.
> 
> // Input stream can now be reused
> //
> input->reuse(input, sourceCode, sourceLen, sourceName);
> 
> // Reset the common token stream so that it will reuse its resources
> //
> tstream->reset(tstream);
> 
> // Reset the lexer (new function generated by antlr now)
> //
> lxr->reset(lxr);
> 
> // Reset the parser (new function generated by antlr now)
> //
> psr->reset(psr);
> }




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34924] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Ruslan Zasukhin
On 11/16/11 6:00 PM, "Jim Idle"  wrote:

> xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;

Jim,

This gives error as
SqlParser_v3.g:879:21: cannot write to read only attribute: $u.type


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34921] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Ruslan Zasukhin
On 11/16/11 6:00 PM, "Jim Idle"  wrote:

> [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks,
> oops.
> 
> Do not use the $text annotations if you want performance, they are purely
> for convenience ­ I must have said this 5000 times and I wish I had never
> added that bit ;) I also told you 3 or 4 times in various emails not to use
> it. I think that that is in the API docs somewhere, but I should make sure
> that it is, if it is not.

Right you told ...

But in docs, ANTLR books, examples, everywhere present this

hex_string_literal

:s = HEX_NUMBER  -> CONST_STR_HEX[$s.text->chars]

Yes, I have checked C API docs even today, but have found any special page,
which says

Java guys do this
C guys do this.


> There is no memory leak, but the auto string stuff does not release until
> you free the string factory, which only happens when you free the parser,
> not when you reuse it. Because it allocates small strings all the time, it
> kills performance, and then you will page.

Clear.

So when I "fix" all places with .text usage problem with memory should
disappear self.


> xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;

> I think that the field name is type but you get the idea.

Yes, I will try this asap and give feedback.
I have 40 such places in parser. And some number in the tree parser.


>  Don¹t use the
> fake object oriented stuff when you want performance, use the structs
> directly ­ you will find that it is many times faster than the v2 C++, not
> slower ­ this is C and you should get as close to the metal as you can.

I very hope :-)

If with PARSER I think I see how I can use this $s.type
I will check right now other 39 places in parser :)

=
It is not clear to me what we can do with Tree Parser ??

So I have some token, e.g. Date or time or other literal.
I make label, now I need get TEXT.

general_literal returns [ENode_Const_Ptr res]

: cd=CONST_DATE
{ res=make_enode_date ( GET_FBL_STRING($cd.text) );  }



So far I have found, that I can do something as

general_literal returns [ENode_Const_Ptr res]

: cd=CONST_DATE
  {
  pANTLR3_COMMON_TOKEN pToken = $cd->getToken( $cd );
  ANTLR3_MARKER pStart = pToken ->getStartIndex( pToken );
  ANTLR3_MARKER pEnd  = pToken->getStopIndex( pToken );
  Do some job ...
  }


Does such code in TreeParser looks correct for you?

Is it really safe and  getStartIndex / getStopIndex always return us correct
pointers?

Of course this can be extracted into special func to be used in many places
in one line of code ...

Just I believe there is no any example in C and any docs pages which discuss
this for TreeParser and C. If exists please point me by finger :-)


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34917] [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Ruslan Zasukhin
Hi Jim,

I have spent 2 days running around this, and now I am ready describe what I
see, to get your help, and it seems exists bug/leaks in reuse() area. Or I
not correctly use it, but I do as you described in single letter 3 months
ago.

So ... Long story :-)

* I have simple bench that do 100K INSERT commands.

v2 parser do this in 19 seconds.
v3 parser no reuse do this in 24 seconds.

OF COURSE we must expect speedup if to reuse lexer/parser.

So I have design code to be able easy switch between these 2 ways.
And when I try go with reuse I get comparable speed by 2GB of RAM eaten.

=
* Using Apple XCODE 4.2 Instruments, I see what is going on.

   this is not leaks actually, just parser always allocate and allocate
ANTLR_STRING objects, in parser and tree-parser rules which use

$c.text


=
FOR EXAMPLE:

* I did have in the parser rule:

hex_string_literal
:s = HEX_NUMBER -> CONST_STR_HEX[$s.text->chars]
;

ZERO my own code here. Right?
And I see that $s.textin C code expanded to getText() allocates and
allocates ... 
So it is never reused as I understand.



=
BTW

When I have to see that get_Text() is used, and I remember you told avoid
this, 
I have jump to sources and have come to  idea:

why here to create new token, I need getText() ??
May be I can just change token type as the following:

hex_string_literal
:s = HEX_NUMBER  { $s->setType( $s, CONST_STR_HEX ); }
;

And it seems this works fine

I have correct few rules in such way in the parser
But Tree Parser  still have for example this:

general_literal returns [ENode_Const_Ptr res]
: cd=CONST_DATE {res=make_enode_date ( GET_FBL_STRING( $cd.text) );}
| ct=CONST_TIME {res=make_enode_time ( GET_FBL_STRING( $ct.text) );}
| s=const_str   {res=make_enode_str  ( GET_FBL_STRING( $s.text ) );}
;

All these  $c.text  calls getText() -- this makes COPY of string buffer,
Then I convert into our own FBL_String...

PROBLEM 1:  this ANTLR STRINGs produced by get_Text()  never are reused as I
see.

PROBLEM 2:  related to speed also ‹ how we can avoid here make copy of
string?
 in sources I see that exists code as

return ((pANTLR3_COMMON_TREE)(tree->super))->token->getText(
   ((pANTLR3_COMMON_TREE)(tree->super))->token);


May be something can be optimized/hacked here?
For example may be I can write own func, which check what token have
  char* or ANTLR_String, and choose way ...

But what syntax come to token in the .g?
I can do own macro of course ...
Just I want get some feedback if this can be a good idea for all?


=
And this is how I try reuse Lexer/Parser and NOT TreeParser.
All follow to your letter Jim:

void SqlParser_v3::ResuseParserObjects(
const char*inTextToParse,
vuint32inLength )
{
// ---
// TREE PARSER cannot be reused. Destroy it.
//
if( mpTreeParser )
{
mpTreeParser->free( mpTreeParser );
mpTreeParser = NULL;
}

if( mpNodes )
{
mpNodes->free( mpNodes );
mpNodes = NULL;
}


// ---
// Reuse other objects
//
mpInput->reuse(
mpInput, 
(pANTLR3_UINT8) inTextToParse,
(ANTLR3_UINT32) inLength,
(pANTLR3_UINT8) "VSQL" );

mpTokenStream->reset( mpTokenStream );
mpLexer ->reset( mpLexer );
mpParser ->reset( mpParser );

ResetOwnData( mpParser );
}





-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34761] Re: [antlr-interest] .g vs .g4

2011-11-06 Thread Ruslan Zasukhin
On 11/6/11 7:22 PM, "Terence Parr"  wrote:

Hi Terence,

> Hi, Sam Harwell and I are talking about the file extension for new ANTLR 4. We
> think it makes good sense to use .g4 to distinguish the new grammars. They're
> mostly compatible except that there is no need for syntactic predicates and
> .g4 allows immediate left recursion.

This sounds cool.

Is this already described on some pages ?


> I think it will reduce confusion about
> what version of ANTLR is necessary to compile a grammar.

Yes for me g4  sounds good.



ANTLRWorks of course should be able accept .g  and even .txt
If I drag and drop file

This is common practice for MAC OS for example.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34509] Re: [antlr-interest] [C target][HEELP :-] About disable recovery ... Override recoverFromMismatchedToken() ... Linker error

2011-10-22 Thread Ruslan Zasukhin
On 10/21/11 7:48 PM, "Jim Idle"  wrote:

Aha, 

So I was stupid enough (shame on me :-) to not see that I can do things as

void* recoverFromMismatchedToken_off(
pANTLR3_BASE_RECOGNIZER recognizer,
ANTLR3_UINT32 ttype,
pANTLR3_BITSET_LIST follow)
{
recognizer->mismatch( recognizer, ttype, follow );
 magic :) 

   return NULL;
}


Thank you for point, Jim.  I will try this way,
but as I have pointed query

DRO TABLE T1;

Even do not call recoverFromMismatchedToken handler,
Because here no viable alternative happens  ...
So I was need override recover()

> In the baserecognizer code, the functions are called directly, but they
> are all available indirectly via the pointer interface. I think you are
> not fundamentally understanding this. So, they are all static to their
> source code files and do not pollute the namespace, and when they are not
> being called internally, you call them via their pointers in the
> interface.

Of course I have read sources and even debug them, and have see that
pointers, because I already did override error builder func ...

Just, try please understand, that reading sources is not best way to catch
HOW TO USE library.

I think if even me - C++ developer 20 years, have not to see that way of
call indirectly your internal methods ...it will be not easy for other
developers also ...

Because this is quite unusual for LIBRARIES ... You use very smart
techniques in ANTLR/C, this is cool ... But they are not so obvious for
others as you may think :)

Will be very helpful to have such
C/Examples/DisableRevocer

Thank you for patience :)
 
> Do you see this in the antlr3BaseRecognizerNew():
> 
> recognizer->match   = match;
> recognizer->matchAny= matchAny;
> recognizer->memoize = memoize;
> recognizer->mismatch= mismatch;
> recognizer->mismatchIsUnwantedToken = mismatchIsUnwantedToken;
> recognizer->mismatchIsMissingToken  = mismatchIsMissingToken;
> recognizer->recover = recover;
> recognizer->recoverFromMismatchedElement=
> recoverFromMismatchedElement;
> recognizer->recoverFromMismatchedSet= recoverFromMismatchedSet;
> recognizer->recoverFromMismatchedToken  = recoverFromMismatchedToken;
> 
> etc
> 
> 
> So, install your own versions of whatever you like, then in your external
> version, call the methods via the pointers in the interface. Easy. This is
> true of ALL the interfaces, so that you can override any method you like.
> Now, perhaps I should have called the error recovery methods indirectly in
> the library itself, but they are all bound together and it is a trivial
> matter for you to call mismatch indirectly instead of directly in your
> code.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34489] [antlr-interest] [C target] About disable recovery ... [Solution?]

2011-10-21 Thread Ruslan Zasukhin
Hi All,

I want describe for archive of this list solution which I have found after 3
days learning issue.

So ... Story is.

1) I have read book ... Here info only for Java ... Overrite this. Catch
exception.. But in C no exception, so it is clear I must find info for C.

2) I jump to site -> docs -> C target   ===>  NOTHING ?
but should be info + example here, because C target
differs from Java. No exceptions.

3) I start search list archive... Nothing useful except last letter from
guy, which note that C comment says we must override mismatchRecover(),
But there is no such function, probably it was renamed to
recoverFromMismatchedToken()
 

4) Okay, I try override this function in MY ,cpp file.
But... Ops.  mismatch() function,  is STATIC, so I cannot link to it from my
sourcs.

Jim have suggest copy to my sources mismatch() also,
Okay I try that and ... It even not compiles, because mismatch() calls other
static funcs ...   

Dirty force then:  I make it in the antlr3baserecognizer.c

Compiles links ...  HAPPY -> start do tests  


And I see that for example such SQL query with error

"DRO TABLE T1"

Still do 3 iterations ...

Debugging shows me that I even do not get into recoverFromMismatchedToken()
Oops again...

After some debugging, I have note, that each rule, always calls recover(),
and exactly this method clears error flag.

So I do next attempt:  I make own simple recover()

void vdb_recover_off( pANTLR3_BASE_RECOGNIZER recognizer )
{
return;
}


And replace default in the parser:

@parser::apifuncs
{
// Install custom error message display
//
RECOGNIZER->displayRecognitionError = vdb_BuildRecognitionErrorStr;
RECOGNIZER->recover = vdb_recover_off;



And finally It works as I want that ...
Parser builds error string and stops right on the first work  DRO.
 
I will write also now separate letter to Terrence and Jim,
where I will try explain why such simple thing as
option RECOVER = FALSE

Should not take 3 days from users of ANTLR :)

I hope this letter will save time to future C users of ANTRL.


===
P.S. 

Also while I did learn this recover dances on ANTLR,
And read that we must override recoverFromMismatchedToken()
I have asked self:

here exists yet 
recoverFromMismatchedSet(),
recoverFromMismatchedElement
recover()


Why only ONE that function should be overriden?

I think in Docs, Books, Comments, this should be explained also in clean
way. Right?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34488] Re: [antlr-interest] [C target][HEELP :-] About disable recovery ... Override recoverFromMismatchedToken() ... Linker error

2011-10-20 Thread Ruslan Zasukhin
On 10/21/11 12:41 AM, "Ruslan Zasukhin" 
wrote:

>> You need to copy the mismatch locally or call it via a pointer (if it is in
>> the interface, but I think it is not).

Jim,

This not works, because mismatch() calls other static hidden functions.
Oops ... E.g. 


error: 'mismatchIsUnwantedToken' was not declared in this scope


Something is not good here

Anybody really was able do this for C Target?  :-)



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34487] Re: [antlr-interest] [C target][HEELP :-] About disable recovery ... Override recoverFromMismatchedToken() ... Linker error

2011-10-20 Thread Ruslan Zasukhin
On 10/21/11 12:35 AM, "Jim Idle"  wrote:

Okay I see ... Thank you for point.


Hmm. 

Jim, I think it will be better and simpler, if you will add right into
antlr3baserecognizer.c file this small function

void* recoverFromMismatchedToken_off(
pANTLR3_BASE_RECOGNIZER recognizer, ANTLR3_UINT32 ttype,
pANTLR3_BITSET_LIST follow)
{
mismatch( recognizer, ttype, follow );
return NULL;
}


And provide for us easy way to switch to it in the @parser::apifuncs

Then WE - users of ANTLR, will need just one line to switch.
This will be good yes?  :-)



> You need to copy the mismatch locally or call it via a pointer (if it is in
> the interface, but I think it is not).

> Jim



> *From:* Ruslan Zasukhin [mailto:ruslan_zasuk...@valentina-db.com]
> *Sent:* Thursday, October 20, 2011 2:30 PM
> *To:* Jim Idle
> *Subject:* [C target][HEELP :-] About disable recovery ... Override
> recoverFromMismatchedToken() ... Linker error
> 
> 
> 
> Hi Jim,
> 
> So it seems I have realize (like prev person who did ask same)
> That  this comment is out of date and  mismatchRecover()
> was renamed to recoverFromMismatchedToken()
> 
> ---
> /// To turn off single token insertion or deletion error
> /// recovery, override mismatchRecover() and have it call
> /// plain mismatch(), which does not recover.  Then any error
> /// in a rule will cause an exception and immediate exit from
> /// rule.  Rule would recover by resynchronizing to the set of
> /// symbols that can follow rule ref.
> ///
> static void *
> match(pANTLR3_BASE_RECOGNIZER recognizer,
> ANTLR3_UINT32 ttype, pANTLR3_BITSET_LIST follow)
> ---
> 
> 
> So I have try made this function as next:
> 
> void* recoverFromMismatchedToken_off(
> pANTLR3_BASE_RECOGNIZER recognizer, ANTLR3_UINT32 ttype,
> pANTLR3_BITSET_LIST follow)
> {
> mismatch( recognizer, ttype, follow );
> return NULL;
> }
> 
> @parser::apifuncs
> {
> // Install custom error message display
> //
> RECOGNIZER->displayRecognitionError = vdb_BuildRecognitionErrorStr;
> RECOGNIZER->recoverFromMismatchedToken = recoverFromMismatchedToken_off;
> ...
> 
> 
> ==
> *PROBLEM IS:
> *
> Method match() is declared as static  in the antlr3baserecognizer.c
> So linker of course give us error.
> 
> Then I wonder:
> why comment says that we must call  mismatch() ?
> how ??
> 
> 
> We should change sources of ANTLR3 ? I hope not.
> But so far I do not see other way ...
> Static method is static method ...
> 
> 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34481] [antlr-interest] Typo in comments in C runtime... and Where is mismatchRecover() ??

2011-10-20 Thread Ruslan Zasukhin
Hi Jim,

For your info:

FILE  antlr3baserecognizer.c
 

/// The exception that was passed in, in the java implementation is
/// sorted in the recognizer exception stack in the C version. To 'throw' it
we set the
/// error flag and rules cascade back when this is set.
///
static void *
recoverFromMismatchedToken  (pANTLR3_BASE_RECOGNIZER recognizer,
ANTLR3_UINT32 ttype, pANTLR3_BITSET_LIST follow)



sorted in the recognizer
^^

I think should be stored ...



==
Also functions match() and mismatch() have comments,
Where is said that 

override mismatchRecover() and have it call
/// plain mismatch(), which does not recover

But Greep search over ANTLR 3.4 sources find only 3 mentions of this
mismatchRecover
And they all are in the comments.

antlr3baserecognizer.c: /// recovery, override mismatchRecover() and have it
call
antlr3baserecognizer.c: /// differently.  Override and call
mismatchRecover(input, ttype, follow)
antlr3baserecognizer.c: /// single token insertion and deletion. Override
mismatchRecover


There is no such function.

So where it is?  Comments are wrong?


I try now find way disable recover for our Valentina SQL parser...
Have re-read book section. But there only Java code ..
Have check C target docs ‹ nothing about recover ...
Have try search email list ‹ so far nothing useful found...


I think ­ C target docs should have example code ...
Because very different to Java ... No exceptions ...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34413] [antlr-interest] [Q] how rewrite tree for this simple rule with alternatives?

2011-10-15 Thread Ruslan Zasukhin
Hi All,
Hi Terrence,


What still wonders me a lot in ANTLR v3  is that fact that we must quite
often to use helper rules_leafs to build trees.


EXAMPLE in v2 rule looks as

alter_trigger_statement
:"alter"! "trigger"! trigger_name ( "enable" | "disable" )
{ ## = #(#[ALTER_TRIGGER, "ALTER_TRIGGER"],##); }
;




ANTLR3 working solution:

alter_trigger_statement
:alter_trigger_statement_leaf
-> ^( ALTER_TRIGGER alter_trigger_statement_leaf )
;

alter_trigger_statement_leaf
:T_ALTER! T_TRIGGER! trigger_name ( T_ENABLE | T_DISABLE )
;




Because attempts to make it in SINGLE rule way  -- not works:

alter_trigger_statement
:T_ALTER T_TRIGGER trigger_name ( T_ENABLE | T_DISABLE )
-> ^( ALTER_TRIGGER trigger_name (T_ENABLE | T_DISABLE) )
;


PROBLEM comes from this alternative (T_ENABLE | T_DISABLE ) ...
We cannot label it right?

And it is wrong make both tokens optional

alter_trigger_statement
:T_ALTER T_TRIGGER trigger_name ( T_ENABLE | T_DISABLE )
-> ^( ALTER_TRIGGER trigger_name T_ENABLE? T_DISABLE? )
;


And we have many such places, when IMAGINARY token,
added in v2 as easy as one line, requires in v3 adding one more RULE.

As for me, this not sounds, as "more clear and powerful way to build trees".


Am I really blind and not see some easy way for v3?
In examples of ANTLR I have not found any that use imaginary tokens btw...

Thank you for any ideas...




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34385] Re: [antlr-interest] Advice on best practice?

2011-10-13 Thread Ruslan Zasukhin
On 10/12/11 9:15 PM, "Jim Idle"  wrote:

Hi Jim,

> Avoid backtracking like the plague if you need performance. But if you are
> careful in the order of your alts and use it on just a few
> decisions/rules, then it might not be so bad (but remember that your error
> messages will be weak).

Then next question comes to mind:
does ANTLR allow us somehow easy to see that during parsing
was made 2-3 or 4 backtracking.

Like profiler ... So we can see that and start learn where and why ..
? 
 
> If k=1 on a decision then ANTLR will work that out so you don't need to
> specify but if you want to avoid ANTLR following every possible alt then
> you can use k=1 on a particular rule or sub rule to avoid ambiguity
> errors.

Aha, so playing with k=1 option,
We can even kill ambiguity errors/warnings.

Interesting. 

This is what I ask for -- it will be great to have list of such tricks/rules
from ANTLR experts ..

>  Basically if you know that a decision will be correct at k=1 even
> though ANTLR can see ambiguities, then tell it so.

> Before 3.4 this would still give a warning unless you added a 1 token
> predicate, but I believe that Ter changed this for 3.4 so that I could remove
> a lot of those predicates from my own T-SQL grammar.

So we do not need that one-token predicates now ... Good.
But I do not understand how it looks?

Can you show example of such predicate just for interest ?

 
> Jim
> 
> 
> 
>> -Original Message-
>> From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
>> boun...@antlr.org] On Behalf Of Ruslan Zasukhin
>> Sent: Wednesday, October 12, 2011 11:05 AM
>> To: antlr-interest@antlr.org
>> Subject: [antlr-interest] Advice on best practice?
>> 
>> Hi All, Terrence, Jim,
>> 
>> I have review FAQs, other docs and list ...
>> But sop far cannot find isolated advices/tips of kind
>> 
>> 
>> * Tend develop grammar as k=1
>>AVOID to use k=*, because this is slower ...
>> 
>> OR reverse
>> 
>> * No problems to use k=*
>>  Always prefer to use k=*  and do not worry ... Speed will be fine..
>> 
>> OR
>> 
>> * always tend to (not)? Use  backtrack option.
>> and if you use it then use memoize also ...
>> 
>> 
>> In the ANTLR3/Examples/C  I see options as
>> 
>> 
>> 
>> grammar C;
>> 
>> options
>> {
>> backtrack= true;
>> memoize= true;
>> k= 2;
>> language= C;
>> }
>> 
>> 
>> 
>> But what is official point of view at current state ?

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34373] Re: [antlr-interest] Advice on best practice?

2011-10-12 Thread Ruslan Zasukhin
On 10/12/11 9:04 PM, "Ruslan Zasukhin" 
wrote:

> Hi All, Terrence, Jim,
> 
> I have review FAQs, other docs and list ...
> But sop far cannot find isolated advices/tips of kind
> 
> 
> * Tend develop grammar as k=1
>AVOID to use k=*, because this is slower ...
> 
> OR reverse
> 
> * No problems to use k=*
>  Always prefer to use k=*  and do not worry ... Speed will be fine..
> 
> OR
> 
> * always tend to (not)? Use  backtrack option.
> and if you use it then use memoize also ...
> 
> 
> In the ANTLR3/Examples/C  I see options as
> 
> 
> 
> grammar C;
> 
> options 
> {
> backtrack= true;
> memoize= true;
> k= 2;
> language= C;
> }
> 
> 
> 
> But what is official point of view at current state ?

In that C grammar also present comment

/** Either a function definition or any other kind of C decl/def.
 *  The LL(*) analysis algorithm fails to deal with this due to
 *  recursion in the declarator rules.  I'm putting in a
 *  manual predicate here so that we don't backtrack over
 *  the entire function.  Further, you get a better error
 *  as errors within the function itself don't make it fail
 *  to predict that it's a function.  Weird errors previously.
 *  Remember: the goal is to avoid backtrack like the plague
 *  because it makes debugging, actions, and errors harder.
 *
 *  Note that k=1 results in a much smaller predictor for the
 *  fixed lookahead; k=2 made a few extra thousand lines. ;)
 *  I'll have to optimize that in the future.
 */

I wonder, if this already was optimized?
Or still in the future?


Should we also try to specify k=1
for most rules ?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34372] [antlr-interest] Advice on best practice?

2011-10-12 Thread Ruslan Zasukhin
Hi All, Terrence, Jim,

I have review FAQs, other docs and list ...
But sop far cannot find isolated advices/tips of kind


* Tend develop grammar as k=1
   AVOID to use k=*, because this is slower ...

OR reverse

* No problems to use k=*
 Always prefer to use k=*  and do not worry ... Speed will be fine..

OR

* always tend to (not)? Use  backtrack option.
and if you use it then use memoize also ...


In the ANTLR3/Examples/C  I see options as



grammar C;

options 
{
backtrack= true;
memoize= true;
k= 2;
language= C;
}



But what is official point of view at current state ?



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34320] Re: [antlr-interest] rewrites to capture tree structure and original text

2011-10-08 Thread Ruslan Zasukhin
On 10/7/11 11:12 PM, "Jim Idle"  wrote:

> You don't need to do this. If you want the text that makes up that part of
> the tree, you can use the start and end tokens in the tree parser and
> write a few lines of java that will give you the text. Placing it in the
> tree is just duplicating it.

Hi Jim,
Hi Terence,

Actually I want only underline point I already have expressed before.

I think, that ANTLR should include such utility methods.
Users of ANTLR should not invent each time such "simple" algs.

Set of this algs can be and should be official and fixed and documented and
be consistent for all targets

Java

String  JointTextOfTokens(
Token inStartToken,
Token inEndToken )
{

}



c:

pANTLR3_STRING  JointTextOfTokens(
Token* inStartToken,
Token* inEndToken )
{

}



This can save many hours to users of your product.



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34196] Re: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed

2011-09-27 Thread Ruslan Zasukhin
Hi Jim,

What you think about this idea to resolve everything on the LEXER level?

So we must resolve tokens as

* STRING_LITERAL  'aa'
* STRING_LITERAL  'aa' ws* 'bb' => Token( "aabb" )

* STRING_LITERAL  'aa\'bb'  => Token( "aa'bb" )
* STRING_LITERAL  'aa''bb'   => Token( "aa'bb" )
* STRING_LITERAL  'aa''bb''cc'  => Token( "aa'bb'cc" )

* HEX_LITERAL  x'aa'  => Token( "aabb" )
* HEX_LITERAL  x'aa' ws* 'bb' => Token( "aabb" )


Do you think we can do this in [C] without copying buffers?
I think not.

Then question is: 
how this can be solved using minimal copies?

Or you think that better really use
Lexer -> Parser -> TreeParser combination ?


On 9/28/11 1:34 AM, "Ruslan Zasukhin" 
wrote:

> On 9/28/11 12:46 AM, "Douglas Godfrey"  wrote:
> 
> Hi Douglas,
> 
> Yes, I have thinked about this way also.
> 
> But in your solution you use helper functions as
> RemoveQuotePairs()
> 
> Which, I guess do some coping in additional ram buffers.
> This is fine for Java guys, but in C code, as Jim likes underline each time,
> we tend to use only pointers to input buffer, as long as possible.
>  
> 
>> You need to modify your string lexing rules to use sub-rules for the
>> elementary
>> strings and return the concatenated string as the lexer token value.
>> 
>> The value of 
>> 
>> StringConstant: QuotedString
>> {RemoveQuotePairs($QuotedString);};
>> 
>> fragment
>> QuotedString:  ( StringTerm )+;
>> 
>> fragment
>> StringTerm:  Dquote ( Character )* Dquote;
>> 
>> fragment
>> Character: ( ' ' | AlphaChar | Punctuation | Digit );

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34193] Re: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed

2011-09-27 Thread Ruslan Zasukhin
On 9/28/11 12:46 AM, "Douglas Godfrey"  wrote:

Hi Douglas,

Yes, I have thinked about this way also.

But in your solution you use helper functions as
RemoveQuotePairs()

Which, I guess do some coping in additional ram buffers.
This is fine for Java guys, but in C code, as Jim likes underline each time,
we tend to use only pointers to input buffer, as long as possible.
 

> You need to modify your string lexing rules to use sub-rules for the
> elementary
> strings and return the concatenated string as the lexer token value.
> 
> The value of 
> 
> StringConstant: QuotedString
> {RemoveQuotePairs($QuotedString);};
> 
> fragment
> QuotedString:  ( StringTerm )+;
> 
> fragment
> StringTerm:  Dquote ( Character )* Dquote;
> 
> fragment
> Character: ( ' ' | AlphaChar | Punctuation | Digit );

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34190] Re: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed

2011-09-27 Thread Ruslan Zasukhin
On 9/27/11 9:45 PM, "Jim Idle"  wrote:

Hi Jim,

As always thank you a lots for your time.

> Each token contains the char * pointer that is in to the input stream
> start, which is what I generally use, but if you want to use my build in
> string stuff and have it auto free then it is just:
> 
> csl
> @declarations { pANTLR3_STRING s; }
> : s1=STRING
>  { s= $s1.text; }
>(
> s2=STRING
> {
> s->append(s, $s2.text);
> }
> 
>)*
> { $s1->setText(s);  /* Check that, but I think it is this */ }
> 
> ->s1
> ;

Nice to see I am becoming expert in the ANTRL3 :-)

I have try this way above. What I like here is that if there is only ONE
literal, 
what is true for 99%  we are still effective, no need do append() or use
other buffers.

This is how should looks above rule, to really compile ...

character_string_literal
@declarations{
pANTLR3_STRING s;
}
:s1 = STRING_LITERAL{ s = $s1.text; }
( s2 = STRING_LITERAL   { s->append( s, (const char*)
$s2.text->chars ); }
)*

{ $s1->setText( $s1, s ); }

-> $s1
;


But (!!)

This rule in the latest ANTLR 3.4.1 generate C code, which not compiles.
Oops.
This is why I have spent yesterday the whole evening loosing hairs :-)

Look on generated code:

// $ANTLR start synpred20_SqlParser_v3
static void synpred20_SqlParser_v3_fragment(pSqlParser_v3Parser ctx )
{
pANTLR3_COMMON_TOKEN;   <<<<<<<   should be s2;

   = NULL;   <<<<<<<   s2 =
NULL;


// 
/PARADIGMA/Developer_2/sources/VKernel/VSQL/Parser/v3/grammars/SqlParser_v3.
g:644:5: (s2= STRING_LITERAL )
// 
/PARADIGMA/Developer_2/sources/VKernel/VSQL/Parser/v3/grammars/SqlParser_v3.
g:644:5: s2= STRING_LITERAL
{
s2 = (pANTLR3_COMMON_TOKEN) MATCHT(STRING_LITERAL,
&FOLLOW_STRING_LITERAL_in_synpred20_SqlParser_v34838);


I don¹t know why this is happens.
It seems happens only for STRING_LITERAL in my grammar.
But I do not see nothing special to this LEXER-generated token.
I can send you my parser.g file so you can test self to see where is
trouble.


Now, I have this rule working using above idea.
GOOD. Thank you, Jim.

And now I am ready to play with second way

csl
: s1+=STRING -> $s1+  /* Or, ->^(SLIT $s1+) */
;


I have also try this way ysterday (again I am glad I have think about it)
But I was not able find solution how to join all that tokens in the array.

You have give nice idea ­ join them in the TreeParser.
Yes, indeed this can work... So I will play now with this way.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34177] [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed

2011-09-27 Thread Ruslan Zasukhin
Hi All,

= TASK ==

In SQL we must be able write
  SELECT 'aaa' ''

And this should be same as
  SELECT 'aaa'

I.e. Parser must concatenate literals self.
This was quite easy do in ANTLR 2,
and I already have kill 5-6 hours in ANTLR 3.  :-((


I have try many tricks for ANTLR3 itself trying to use its tokens and
ANTLR_STRING class but no luck.

Finally I have give up and have try to use simple code as in v2 using
STD::string as place to accumulate literal.

=
character_string_literal
@init{ 
STD::string st;
}
:( STRING_LITERAL
{ 
st.append(
(const char*) $STRING_LITERAL.text->chars,
$STRING_LITERAL.text->len );
} 
)+
-> ^( CONST_STR[ st.c_str() ] )
;
=

But this not works, because new Token object stores just pointer

newToken->textState= ANTLR3_TEXT_CHARP;
newToken->tokText.chars = (pANTLR3_UCHAR)text;
 
And as only STD::string dies we get problem.


Jim, how this simple task can be solved in the C TARGET ?

Also I see that for Java code they can contruct dynamic text
And produce token using that text. For example on this page

http://www.antlr.org/wiki/display/ANTLR3/Tree+construction

-> ^('+' $p INT[String.valueOf($a.int+$b.int)])


But C target tryies to work only which char*


I guess that ANTLR_STRING setText() can help me,
But I cannot see how I can call that from my

-> ^( CONST_STR[ st.c_str() ] )

???

Thank you for points ...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33999] Re: [antlr-interest] g++ compiler warnings on the generated lexer and parser .c files

2011-09-11 Thread Ruslan Zasukhin
On 9/11/11 9:12 PM, "Ruslan Zasukhin" 
wrote:

For history:

 #pragma GCC diagnostic ignored "-Wunused-label"

On some reason do not works
I was able fix warnings from generated files and antrl.c files
In the xcode 

* Selecting .c files
* Get Info -> Build panel
* Add these files:

 -Wno-all -Wno-shadow -Wno-missing-prototypes -Wno-sign-compare






> On 11/10/10 7:34 PM, "Jim Idle"  wrote:
> 
>> You probably need to suppress the warnings for that. The compiler will get
>> rid
>> of them anyway. I thought that this was already done in the generated header
>> but perhaps a different pragma is required if compiling as C++.
> 
> 
> I also now take a look on how to fix warns for XCODE/GCC
> 
> In the Lexer.h present only WIN - visual disables
> 
> 
> #ifdefANTLR3_WINDOWS
> #pragma warning( disable : 4100 )
> #pragma warning( disable : 4101 )
> #pragma warning( disable : 4127 )
> #pragma warning( disable : 4189 )
> #pragma warning( disable : 4505 )
> #pragma warning( disable : 4701 )
> #endif
> 
> 
> For GCC I try this, but so far no effect :(
> 
> #pragma GCC diagnostic ignored "-Wunused-label"
> #pragma GCC diagnostic ignored "-Wunused-parameter"
> #pragma GCC diagnostic ignored "-Wunused-variable"
> #pragma GCC diagnostic ignored "-Wunused-value"

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33998] Re: [antlr-interest] g++ compiler warnings on the generated lexer and parser .c files

2011-09-11 Thread Ruslan Zasukhin
On 11/10/10 7:34 PM, "Jim Idle"  wrote:

> You probably need to suppress the warnings for that. The compiler will get rid
> of them anyway. I thought that this was already done in the generated header
> but perhaps a different pragma is required if compiling as C++.


I also now take a look on how to fix warns for XCODE/GCC

In the Lexer.h present only WIN - visual disables


#ifdefANTLR3_WINDOWS
#pragma warning( disable : 4100 )
#pragma warning( disable : 4101 )
#pragma warning( disable : 4127 )
#pragma warning( disable : 4189 )
#pragma warning( disable : 4505 )
#pragma warning( disable : 4701 )
#endif


For GCC I try this, but so far no effect :(

#pragma GCC diagnostic ignored "-Wunused-label"
#pragma GCC diagnostic ignored "-Wunused-parameter"
#pragma GCC diagnostic ignored "-Wunused-variable"
#pragma GCC diagnostic ignored "-Wunused-value"



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33946] Re: [antlr-interest] [v2 to v3][C++/C] throw C++ exception from parser/tree parser.

2011-09-07 Thread Ruslan Zasukhin
On 9/7/11 7:36 PM, "Jim Idle"  wrote:

Hi Jim,

> You remember that this is all open source, freely given right?

Of course I do :-)

And I have read sources of ANTLR 3.4, be sure.
Btw, my congratulations and respect. Very high quality code.

> Generally,
> don't try and mix C++ things in with the C. Exceptions are almost
> certainly not want you want for error reporting while parsing anyway.

> As I said earlier, copy the existing routine and adapt it. It does as many
> things as it can to show you how to access the information. I can't
> provide a universal error message handler as there is no way to know what
> information your particular parser will have available or how you want the
> messages to look and so on. All your customer error handler need do is
> call a C++ object that you provide and that object can collect the errors
> so that you can print them out at the end etc. The source code is right
> there and well commented :)

Ahhaa,  I hope I have to hear you ...
I will try say it now by own words to double check.

* so we must implement own dsiplayError() -- this is clear.

* we must NOT throw here any C++ exception.
Instead, we must just build some Err String and put it
e.g. Into our own Stack of such error strings

* then we do 

   AST =  parser->entryRule()
   MyNodes = treePareser->entryRule()

Zero exceptions here
Then we just do

if( errCount > 0 )
{   
 stop  job
 somehow report errors to user
 for example

point ZZZ:
 throw  myException(  ErrStack->getErrors() ;)
}


Sounds right?
Thank you, Jim for points.

-
Well, only one more question come to mind.  :-)

I have read that ANTLR3-C self build LIST OF exception objects.
Is this true?

If yes, then may be this is not needed? I mean override displayError()
method?

>> All your customer error handler need do is
>> call a C++ object that you provide and that object can collect the errors
>> so that you can print them out at the end etc.

And we can "convert" that list into exception at point ZZZ in the above
example?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33944] Re: [antlr-interest] [v2 to v3][C++/C] Fresh list of features for v3.4? // @after

2011-09-07 Thread Ruslan Zasukhin
On 9/7/11 7:30 PM, "Jim Idle"  wrote:

Hi Jim,

> It is a C target not a C++ target, so it is raw and closer to the metal.
> It is a lot faster than the 2.7 C++ target as a result, but you have to
> dig a little.

I very hope on this btw, because ANTLR 2.7 parser, which we have used in
Valentina was, for example, slower about x4 times of Lemon parser used in
the SqlLite. So our SQL commands e.g. have work slower of SqlLite, while
Valentina engine itself (if not use SQL) do inserts x20 times faster ...


> Starting at the C API documentation linked from the ANTLR front page,
> select ANTLR3 C Usage Guide then Using the ANTLR3 C Target, then
> implementing Customized Methods, where you find the following text:
> 
> Implementing Customized Methods
> Unless you wish to create your own tree structures using the built in
> ANTLR AST rewriting notation, you will rarely need to override the default
> implementation of runtime methods. The exception to this will be the
> syntax err reporting method, which is essentially a stub function that you
> will usually want to provide your own implementation for. You should
> consider the built in function displayRecognitionError() as an example of
> where to start as there can be no really useful generic error message
> display

Jim, I hope you have see that I have TWO days jump over all pages ...
And of course I have read all this already.

 
> Selecting that shows you the documentation for that function. This is what
> you should override. Start by copying the example version in the runtime,
> then adapt it to your own needs. It is pretty easy.

Yes, I already have copy it and replace to sprintf()  into char buffer.
 

> Also, searching from the Support page of antlr.org:
> 
> http://markmail.org/search/list:antlr+C+displayRecognitionError

I did search and google, and StackOverflow, and Antlr list...
I have read EACH letter about "exception" for last 5 years on list.
:-)


> Or, as there is no work out there in the world right now, you could just
> hire me to do it all for you :)

I will answer you in details privately, :-)


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33937] [antlr-interest] [v2 to v3][C++/C] throw C++ exception from parser/tree parser.

2011-09-07 Thread Ruslan Zasukhin
Hi Terrence, 
Hi Jim,

First of all again: thank you for great job and product(s),
Please do not take below my text as complains, but mainly as explanation
where C++ developer get problems with ANTLR3 and why ...
And some suggestions how this can be may be improved.
Long letter but should be easy to read :-)


===
So ... in ANTLR2 it was very simple to handle errors from box:
it throws exception, few lines of code with catch() -> DONE.

All our db engine expects exceptions from ANTLR v2/v3 and its wrapper code.
Two days I am reading reading reading ...

I have found 2-3 such questions from C++ developers:
if we can throw from my displayError()?

Answer from Jim was like this:

> On 1/15/09 8:23 PM, "Jim Idle"  wrote:
> 
>> You can probably use them carefully, but as you point out, you have to
>> be careful with memory. The runtime tracks all its normal memory
>> allocations so  as long as you close the 'classes' correctly you should
>> generally be OK. However, you should make sure that throwing exceptions
>> does not bypass the normal rule clean up, such as resetting error and
>> backtracking flags and so on,

Okay, but  I'd expect to see more details (x5 - x10 times more text and code
example) at this section

http://www.antlr.org/api/C/index.html

* what is known TODAY?  at 2011  ?
Can we throw here C++ exc?
don't this break logic of C code of parsers?
True working example

* EXAMPLE?

I have open folder Examples/C  and made search on "exception".
Found only in the JAVA files, used for tests of parser.
There is no example for C++ exceptions.
There is no example with override displayRecognitionError()

* Or look on this code-example.
 http://www.antlr.org/api/C/index.html

The only here ERROR-related line is
if (psr->pParser->rec->errorCount > 0)

Then silent ...  And questions come to mind

** So, if not throw() exceptions, then after tree-parser, I check if there
was any errors and IF they was ... What next ???

I have read that C target builds LIST of exception objects,
But where is TEXT and example how navigate that list?


** and if I will throw error from displayRecognitionError() then
such check of counter is useless ...



==
Okay, next ...
There is good helpful pages ANTLR2 to ANTLR3 ...   Great!

But this page mainly about grammar and Java. And zero info here to help
existed C++ developers port their ANTLR2 products.
Hmm.



Also when I watch default displayRecognitionError() from .c
with many points, which do print to stderr as:

---
void displayRecognitionError(
pANTLR3_BASE_RECOGNIZER recognizer,
pANTLR3_UINT8*tokenNames )
{
...
 ANTLR3_FPRINTF(stderr, "-end of input-(");



I wonder, why not provide here same function, which do sprintf() into string
buffer,  and TWO very small wrapper-functions, which get this string and

1) print it to stderr as now;
2) throw it as c++ exception;

//--
pANTLR3_STRING buildRecognitionError(
pANTLR3_BASE_RECOGNIZER recognizer,
pANTLR3_UINT8*tokenNames )
{

return resStr; 
}


//--
void displayRecognitionError_stderr(
pANTLR3_BASE_RECOGNIZER recognizer,
pANTLR3_UINT8*tokenNames )
{
  pANTLR3_STRING res = buildRecognitionError( recognizer, tokenNames  );
  ANTLR3_FPRINTF( stderr, res );
}


//--
void displayRecognitionError_throw(
pANTLR3_BASE_RECOGNIZER recognizer,
pANTLR3_UINT8*tokenNames )
{
  pANTLR3_STRING res = buildRecognitionError( recognizer, tokenNames  );
  throw SomeException( res );
}


Yes, not big deal may be, but will simplify usage of ANTLR3 from box.
I could read in ANTLR 3.5:
just install displayRecognitionError_throw, and your parser
will start throw C++ exceptions.
30 seconds deal, instead of two days and still in doubts ...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33932] [antlr-interest] [v2 to v3][C++/C] Fresh list of features for v3.4? // @after

2011-09-07 Thread Ruslan Zasukhin
Hi Terrence,
Hi Jim,

Our status is:
*  we have used ANTLR 2.7 C++ in our Valentina DB engine near to 10 years.
* we have start smooth steps to ANTLR3 switch more of year ago ...
always something have interrupt us.

Now we have working grammar, and tree parser grammar,
and all this is integrated by #define into db engine, so far all good.

Now task is provide error reporting.
How EASY this was in ANTLR v2 for us - C++ developers..
And I already have spent two days reading reading reading
site, docs, google, mail list, book, ...

Terrence, I do not complain, but IMO, at least for C target it is very hard
find on SITE, docs that answer major questions.

I believe, it is not correct way, push developers to search google, and prev
letters for last 3 years. Right?

While I have read letters, I have note, for example, one at 2009 year, where
is said
@after is not supported in C target

So I ask self.  
* Hmm, I have NOT see this info in docs, which I have to read already. And
* Hmm, interesting if this feature still not implemented for C?

And may be I am blind, but I do not see such page on SITE/WIKI,
As

=
List of Features of Runtime Targets  v3.4
   (and another pages for v 3.3  3.2  3.1  3.0 )

JavaC   C#

@after  yes   no ?
=


You see my point?

I should say that ANTLR3 somehow is harder to use from box ...
May be at least for C target ...
May be at least comparing to what we are used in v2.7


Couple days ago you have asked about features for 4.x  :-)
I would like to ask add such major and important info into SITE.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33849] Re: [antlr-interest] [C] Crashes if NULL name in antlr3StringStreamNew()

2011-08-30 Thread Ruslan Zasukhin
On 8/30/11 4:17 PM, "Ruslan Zasukhin" 
wrote:

> 
> If I call
> 
> input = antlr3StringStreamNew(
> input_string, Encoding, input_len, NULL );
> 
> NULL is name of stream
> Then it crashes inside of
> 
> 
> newStr8(pANTLR3_STRING_FACTORY factory, pANTLR3_UINT8 ptr)
> {
> return factory->newPtr8(factory, ptr,
>  (ANTLR3_UINT32)strlen((const char *)ptr));<<<<<< crash
> }
> 
> 
> Because on mac and linux strlen()  crashes on NULL.

In contrast to above, new  reuse() method can handle stream name as NULL,
Replacing to ³-memory-² name.



antlr38BitReuse(pANTLR3_INPUT_STREAM input, pANTLR3_UINT8 inString,
ANTLR3_UINT32 size, pANTLR3_UINT8 name)
{
input->isAllocated= ANTLR3_FALSE;
input->data= inString;
input->sizeBuf= size;

// Now we can set up the file name. As we are reusing the stream, there
may already
// be a string that we can reuse for holding the filename.
//
if(input->istream->streamName == NULL)
{
input->istream->streamName=
input->strFactory->newStr(input->strFactory, name == NULL ?
(pANTLR3_UINT8)"-memory-" : name);
input->fileName= input->istream->streamName;
}
else
{
input->istream->streamName->set(input->istream->streamName,  (name
== NULL ? (const char *)"-memory-" : (const char *)name));
}

input->reset(input);
}



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33836] [antlr-interest] [C] Crashes if NULL name in antlr3StringStreamNew()

2011-08-30 Thread Ruslan Zasukhin

If I call

input = antlr3StringStreamNew(
input_string, Encoding, input_len, NULL );

NULL is name of stream
Then it crashes inside of


newStr8(pANTLR3_STRING_FACTORY factory, pANTLR3_UINT8 ptr)
{
return factory->newPtr8(factory, ptr,
 (ANTLR3_UINT32)strlen((const char *)ptr));<<<<<< crash
}


Because on mac and linux strlen()  crashes on NULL.



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33832] Re: [antlr-interest] reuse() methos in 3.4 C runtime

2011-08-30 Thread Ruslan Zasukhin
On 6/24/11 7:49 PM, "Jim Idle"  wrote:

Hi Jim,

I do now third attempt switch from ANTLR2 to ANTLR3,

And reuse is very important for us, because we do parse SQL strings in the
DBMS servers ... 
I am very happy to read it should work now again in 3.4!

But I have got question after this your letter.

You say that TreeParser cannot be reused. Then:

A) what do method reset() in the generated TreeParser?
can/should we use it?  If yes when and for what?

B) Do you mean that on each loop we must free() Nodes and TreeParser
nodes   ->free( nodes );nodes   = NULL;
   treePsr ->free( treePsr );  treePsr = NULL;

And then create them again

nodes   = antlr3CommonTreeNodeStreamNewTree(langAST.tree,
ANTLR3_SIZE_HINT); 
treePsr = SqlTreeParser_v3New( nodes );


Or I can just do on each loop
treePsr->reset( treePsr  );


Thank you in advance for explain


> Because the documentation is not yet up to date, here is an example of
> reusing the allocated memory in input streams and token streams:
> 
> 
> 
> for (i=0; i 
> 
> 
> // Run the parser.
> 
> //
> 
> psr->start(psr);
> 
> 
> 
> // --
> 
> // Now reset everything for the next run.
> 
> // Order of calls is important.
> 
> 
> 
> // Input stream can now be reused
> 
> //
> 
> input->reuse(input, sourceCode, sourceLen, sourceName);
> 
> 
> 
> // Reset the common token stream so that it will reuse its resources
> 
> //
> 
> tstream->reset(tstream);
> 
> 
> 
> // Reset the lexer (new function generated by antlr now)
> 
> //
> 
> lxr->reset(lxr);
> 
> 
> 
> // Reset the parser (new function generated by antlr now)
> 
> //
> 
> psr->reset(psr);
> 
> }
> 
> 
> 
> Note that tree parsers cannot reuse their allocations but this is rarely an
> issue. The input->reuse() will reuse any memory it has allocated, but
> requires that you handle the reading of the input files (or otherwise supply
> a pointer to them). The input files are assumed to be encoded in the way
> that the original input was created, for instance:
> 
> 
> 
> input   = antlr3FileStreamNew(fname, ANTLR3_ENC_8BIT);
> 
> 
> 
> Then all reused input must be 8 bit encoded.
> 
> 
> 
> Jim
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33524] Re: [antlr-interest] 3.4 C target release date?

2011-08-07 Thread Ruslan Zasukhin
On 7/29/11 8:05 PM, "Jim Idle"  wrote:

Hi Jim,

Any news on this?


> Yeah the docs were holding me up. I will release it tomorrow.
> 
> Jim
> 
>> -Original Message-
>> From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
>> boun...@antlr.org] On Behalf Of Justin Murray
>> Sent: Friday, July 29, 2011 7:09 AM
>> To: antlr-interest@antlr.org
>> Subject: [antlr-interest] 3.4 C target release date?
>> 
>> Hi Jim,
>> 
>> I'm not trying to rush you or anything, but I am wondering if you have
>> an estimate for the release of the 3.4 C runtime. I'd like to use it in
>> our next release, and with deadlines approaching it would be helpful to
>> know if I'll have time to fit it in or not.
>> 
>> Thank you!
>> 
>> - Justin
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33423] Re: [antlr-interest] 3.4 C target release date?

2011-07-29 Thread Ruslan Zasukhin
On 7/29/11 5:09 PM, "Justin Murray"  wrote:

> Hi Jim,
> 
> I'm not trying to rush you or anything, but I am wondering if you have
> an estimate for the release of the 3.4 C runtime. I'd like to use it in
> our next release, and with deadlines approaching it would be helpful to
> know if I'll have time to fit it in or not.

Hi Justin,

As far as I have see few days ago,

In the main archive-distribution of ANTLR 3.4 presents
"runtime / C - target"

Which sounds to be 3.4 release.
Otherwise why it present in RELEASE archive?  Yes?


Yes, I also have note, that on download page link to C-target
Opens for us page where exists only 3.4b4 archive ...


I also was going to ask about status of C target
And this inconsistency


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33328] Re: [antlr-interest] ANTLR v3.4 released! -- there is no /lib folder ?

2011-07-25 Thread Ruslan Zasukhin
On 7/21/11 1:16 AM, "Terence Parr"  wrote:

Hi All,

So in this ReadMe said
http://www.antlr.org/README.txt

> --
> 
> How do I install this damn thing?
> 
> Just untar antlr-3.4.tar.gz and you'll get:
> 
> antlr-3.4/BUILD.txt
> antlr-3.4/antlr3-maven-plugin
> antlr-3.4/antlrjar.xml
> antlr-3.4/antlrsources.xml
> antlr-3.4/gunit
> antlr-3.4/gunit-maven-plugin
> antlr-3.4/pom.xml
> antlr-3.4/runtime
> antlr-3.4/tool
> antlr-3.4/lib


But when I download
http://www.antlr.org/download/antlr-3.4.tar.gz

And uncompress it, there is no

> antlr-3.4/lib


Lost ?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32831] Re: [antlr-interest] Question: ANTLR and LLVM ... + Clang

2011-06-19 Thread Ruslan Zasukhin
On 6/19/11 5:35 AM, "Kevin J. Cummings"  wrote:

>> On this page very good explained how C++ FrontEnd is bigger
>> of parser
>> 
>> http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html
>> 
>> 
>> So again, if we have task to proceed C++ sources, we may choose between:
>> 
>> 1)  ANTLR and develop or use some C++ grammar,
>>   then spend time on (all/some) features describe on above page
> 
> If you are planning to write a C++ compiler, yes.  But why write one,
> when a number of freely available ones already exist?  Clang, GCC, etc.

No no.

We not going develop own compiler.

We need parse C++ sources, and extract different kind info from that.
Then use that info for different tasks ...

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32826] Re: [antlr-interest] Question: ANTLR and LLVM ... + Clang

2011-06-18 Thread Ruslan Zasukhin
On 6/18/11 7:26 PM, "Douglas Godfrey"  wrote:

Hi Douglas, 

> The SemanticDesigns C++ frontend, like all of their frontend(s) is intended
> for code analysis and transformation, not compiling.
 
> Semantic Designs' tools are based on the old Reasoning Systems Inc.
> tools: Refine and Intervista.
> 
> Semantic Designs' tools parse a source language into an Symbol Table and AST
> with more features than the Antlr AST.

> The tools then take the Symbol Table and AST and either do
> code analysis or reverse compile
> the AST into new source code in the same or a different language.
> The tools no not interface with a compiler backend or machine code generator.

I see. 

Thanks.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32824] Re: [antlr-interest] Question: ANTLR and LLVM ... + Clang

2011-06-18 Thread Ruslan Zasukhin
On 6/17/11 8:22 PM, "Kevin J. Cummings"  wrote:

Hi Kevin,

Well, don't know why you think they cannot be compared.

ANTLR - is Parser -> AST  ->TreeParser

Clang 
contains also parser -- own, seems to be hand-made,
then they have more logic phases.

On this page very good explained how C++ FrontEnd is bigger
of parser

   http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html


So again, if we have task to proceed C++ sources, we may choose between:

1)  ANTLR and develop or use some C++ grammar,
 then spend time on (all/some) features describe on above page

2) take in hands complete C++ Frontend and ...DONE?
For now I see two strong enough such frontends.
Clang and SemanticDesign (which I cannot test it seems as demo).


=
> ANTLR is a tool which can help you build compiler front-ends.  If you
> were industrious enough, you could re-write CLang using ANTLR.
> 
> ANTLR is primarily a JAVA tool (you at least need JAVA to run the tool
> to compile your grammar), but can be used to produce other targeted
> languages (C/C++, Python, etc) for your actual front-end.  While the C++
> support is minimal in version 3 (better in version 2.7, but lacking in
> some of the ST support) resulting in much use of C code which can be
> compiled using C++, you could use it to interface directly to the LLVM
> IR API if you wanted to.  But, I think Ter's example is probably the way
> to go, at least until Version 4 starts to grow and we see what kind of
> C++ runtime support will exist for ANTLR v4.
> 
>> When one should prefer Clang vs ANTLR or reverse?
>> Your opinions?
> 
> I think you are asking the wrong question here.  Please compare apples
> to apples, and not to cucumbers.

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32811] Re: [antlr-interest] Question: ANTLR and LLVM ... + Clang

2011-06-17 Thread Ruslan Zasukhin
Hi Guys,

Thank you very much for answers.

Okay, so about LLVM is clear ... It is compillers - code generation.

But exists yet  Clang --  C/C++/ObjC frontend

And I think Clang can be compared to ANTLR.
Right?

When one should prefer Clang vs ANTLR or reverse?
Your opinions?


===
On 6/17/11 12:03 PM, "ante...@freemail.hu"  wrote:

> You can use antrl to parse the text.
> You can use llvm to generate byte or machine code, optimise code.
> So they should work well with each other. They complement each other.
> C version of antrl is recommended as llvm is written in C++.

And 


==
> On 6/17/11 12:52 PM, "Sergiy Dubovik"  wrote:
> * how ANTLR refer to LLVM?

Antlr is: lexer, parser, ast walker. LLVM - low level virtual machine
e.g. can generate and interpret it's bytecode and can generate
optimized cpu instructions also supports JIT. They have nothing in
common but they can be used to create a compiler. Pretty good one i
have to say.

> * If they can collaborate?

Yes they can. However you still need to write some code. Also llvm is
C++ framework. There are some bindings though.

> * should I use or drop ANTLR if we will want start to use LLVM?

llvm doesn't provide what antlr provides.
==






-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32802] [antlr-interest] Any comments about sematicdesign.com and their DMS ?

2011-06-17 Thread Ruslan Zasukhin
Hi All,

I have meet yesterday this site first time
http://www.semanticdesigns.com/


Sounds like very cool engines ... Although GUI is very ugly at least of
TestCover, which I was able download as demo.


They looks similar somehow to MPS

They compare self to Lexers, Parsers, and other tools.
But ... It seems they compare to old version of ANTLR ..
http://www.semanticdesigns.com/products/DMS/DMSComparison.html


I wonder, if anybody have any comments about them?
At least privately 


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32801] [antlr-interest] Question: ANTLR and LLVM ...

2011-06-17 Thread Ruslan Zasukhin
Hi All,

Have never look deeply on LLVM yet,
But have got fast question

Anybody can give his vision about

* how ANTLR refer to LLVM?
* If they can collaborate?
* should I use or drop ANTLR if we will want start to use LLVM?

And so on 


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32746] Re: [antlr-interest] Extract all rules/lexems/keywords of a Language.g into XML format?

2011-06-11 Thread Ruslan Zasukhin
On 6/11/11 3:16 AM, "Terence Parr"  wrote:

Hi Terrence,

> Hi. Let me be more specific...I am going to serialize entire grammar as an
> augmented transition network into any generated parser or lexer. This will
> improve your recovery and error messages. There will be an API to figure out
> what could've come next.  I won't be using XML.

Okay, this sounds like feature, which will help us build
a smart auto-completion list of next token ...  Right?

Very nice. 

> As for the GUI widget, I'm providing a simple widget that, using reflection if
> I remember correctly, pulls out the appropriate information from a
> parser/lexer and then manages to automatically syntax highlight and flag
> erroneous syntax. For semantics, the programmer would be on their own. It is a
> cheap way for someone to get an editor for their DSL.

I do not catch ... Widget Java only?
   What will do C++ developers :-)

Actually we think not about GUI widget, but about
Feature of ANTLR to produce XML file from any grammar

Example:

Exists  
cpp.g
sql.g
php.g

We add into that .g   or into  antlr -switch
output=xml

and 
   > antlr  putput=xml  cpp.g

Will produce   cpp.g.xml (or sql.g.xml,  php.g.xml )

Which will contain language from .g expressed in XML format...
And may be reverse task XML to .g

Strange task?   :)   I know ...

 
> sometime late this summer I am hoping to have early access
> 
> Ter
> On Jun 10, 2011, at 2:22 AM, Ruslan Zasukhin wrote:
> 
>> On 6/10/11 12:37 AM, "Terence Parr"  wrote:
>> 
>> Hi Terence, 
>> 
>>> i have something like this but am not done.
>> 
>> AST of any grammar into XML ?
>> 
>> Great!
>> 
>> Just keep in mind please that cTAG and GCC-XML products and their tasks.
>> I believe ANTLR is a goold-tool to do same -- even better -- even more.
>> 
>> 
>> This do not presents in the v4 betas yet?
>> Any chance on "early access"?  :-)
>> 
>> 
>>> v4 will have a syntax highlighting  editor as a standard widget.
>> 
>> Can you explain this in more details?
>> 
>> You mean syntax highlighting editor where?
>>  in ANTLRworks? 
>> 
>> Or ANTLR will be able produce something what
>>  any EDITOR app can use later ?

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32726] Re: [antlr-interest] Extract all rules/lexems/keywords of a Language.g into XML format?

2011-06-10 Thread Ruslan Zasukhin
On 6/10/11 12:37 AM, "Terence Parr"  wrote:

Hi Terence, 

> i have something like this but am not done.

AST of any grammar into XML ?

Great!

Just keep in mind please that cTAG and GCC-XML products and their tasks.
I believe ANTLR is a goold-tool to do same -- even better -- even more.


This do not presents in the v4 betas yet?
Any chance on "early access"?  :-)


> v4 will have a syntax highlighting  editor as a standard widget.

Can you explain this in more details?

You mean syntax highlighting editor where?
  in ANTLRworks? 

Or ANTLR will be able produce something what
  any EDITOR app can use later ?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32723] [antlr-interest] Extract all rules/lexems/keywords of a Language.g into XML format?

2011-06-09 Thread Ruslan Zasukhin
Hi All,
Hi Terrence,

We have meet task, where we may want be able extract from
ANY ANTLR grammar

Description of all Rules, Keywords, Lexems, Imaginary Tokens ...
and put that info into e.g. XML format.


Actually this can be considered as transformation of ANTLR grammar into XML
form.

I believe will be possible and reverse task ... That XML back into ANTLR .g
grammar.


---
We think next way is possible:

* you already have PARSER of any ANTLR grammar.

* it parses .g grammar and produce AST...

* it looks not to be difficult walk by that AST and output XML in this or
that form ...

* Or may be string templates can be used ...

* actually you already produce C, Java, C# outputs for a grammar.g
Why not add XML - as a general format?

What you think?



--
Also we have see cTag tools, which can parse many languages,
And produce text output even without XML.
As you may know this tool targeted as Editor App helper library.

But I think, why ANTLR cannot be improved to produce similar output,
and kill cTag as a project :-)



Another example of special but not wide tool is GCC-XML
http://www.gccxml.org/HTML/Index.html


IMO, will be great to extend ANTLR in this way.
This will open new roads for its usage probably.

And this can be for us - users of ANTLR as easy as

Option output = XML


What you think ?





-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32338] Re: [antlr-interest] Is there a safe and easy way to reuse LEXER and PARSER objects on C target?

2011-04-29 Thread Ruslan Zasukhin
On 4/20/11 5:17 PM, "Jim Idle"  wrote:

Hi Jim,

> This is available as the reuse() method on the input stream, lexer, and
> parser. 

What about Tree Parser?

> If used then the lexer will also reuse the tokens from the last
> run and avoid any malloc.

Good.
 
> However, you will need to use the snapshot in perforce to get that. Or
> just wait a few weeks for the next release.

* does this means that last few years, all who use ANTLR3  are not able to
reuse 
Input/Lexer/Parser/Tree_Parser

Hard to believe :-)


* where to get this snapshot? URL?

* does exists any special example/test of this in that snapshoot?

* does exists any special example/test of this in the official archive of
ANTLR3?

* how many weeks you think?  Worse case?


* any chance to get this working using official release 3.3?

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32267] Re: [antlr-interest] Is there a safe and easy way to reuse LEXER and PARSER objects on C target?

2011-04-20 Thread Ruslan Zasukhin
On 4/20/11 5:17 PM, "Jim Idle"  wrote:

> This is available as the reuse() method on the input stream, lexer, and
> parser. 

And tree parser?

> If used then the lexer will also reuse the tokens from the last
> run and avoid any malloc.
> 
> However, you will need to use the snapshot in perforce to get that.
> Or just wait a few weeks for the next release.

I see, thank you Jim.


One more question. After read book and C-target docs,
I did not see section, which describe how works c-target with memory.

In my memory seat words, that C-target have own mempool,
which allow you destroy all objects (tokens, ast) at once.

Reuse()  will destroy not needed objects?


Btw, this is great to have own mempool.
I believe in v2 I have see that tons of destructor calls for each item...
Profiles did show 5-10% for destructors...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32259] Re: [antlr-interest] Is there a safe and easy way to reuse LEXER and PARSER objects on C target?

2011-04-20 Thread Ruslan Zasukhin
On 12/11/09 6:17 PM, "Jim Idle"  wrote:

And I hope this is already fixed during last 1.5 year?   :-)

I bet that re-create lexer/parser objects is much slower.
In v2 we have very fast drop this idea.
And v2 ANTLR was able easy enough reuse  lexer/parser and treeparser.


> Ok - that is probably a bug. I guess nobody tried to do that before ;-). For
> now, you will have to recreate the parser each time until I can fix it.
> 
> Jim
> 
> From: Ronghui Yu [mailto:stone...@gmail.com]
> Sent: Friday, December 11, 2009 5:01 AM
> To: Jim Idle
> Cc: antlr-interest
> Subject: Re: [antlr-interest] Is there a safe and easy way to reuse LEXER and
> PARSER objects on C target?
> 
> Here is my pseudo code:
> 
> pLexer->pLexer->rec->reset(pLexer->pLexer->rec);
> pInputStream->data = (pANTLR3_UINT8)GetText().c_str();
> pInputStream->sizeBuf = (ANTLR3_UINT32)GetText().length();
> pInputStream->reset(pInputStream);
> pLexer->pLexer->setCharStream(pLexer->pLexer,pInputStream);
> pTokenStream->free(pTokenStream);
> pTokenStream = antlr3CommonTokenStreamSourceNew(TOKEN_SIZE_HINT,
> TOKENSOURCE(pLexer));
> pTokenStream->discardOffChannelToks(pTokenStream,ANTLR3_FALSE);
> pParser->pParser->setTokenStream(pParser->pParser,pTokenStream->tstream);
> 
> 
> It works most of the time, but occasionally violation access occurs. It
> doesn't work if applied to a grammar file importing another grammar. For
> example, I have a a keywords.g file is imported to the main grammar. When
> setting the token stream on the main grammar, the token string of embedded
> parser does not get updated automatically.
> 
> On Fri, Dec 11, 2009 at 1:11 AM, Jim Idle  wrote:
> To be honest, I would think you can hardly measure the time taken to create
> them, but you have to call the reset() methods and set the character stream
> and the token stream. There were issues with that at one point but I think I
> have fixed them all now. You can see how to reuse the lexer by looking at the
> examples in the examples download. Specifically the Java parser example will
> help here.
> 
> Jim
> 
> From: antlr-interest-boun...@antlr.org
> [mailto:antlr-interest-boun...@antlr.org] On Behalf Of Ronghui Yu
> Sent: Thursday, December 10, 2009 8:21 AM
> To: antlr-interest
> Subject: [antlr-interest] Is there a safe and easy way to reuse LEXER and
> PARSER objects on C target?
> 
> Hi, All,
> 
> On my project, I have a parser for parsing different statements again and
> again. In order to save a little time on initialization, I would like to reuse
> the LEXER and PARSER objects created the first time, something like this:
> 
> if (bInitialized)
> {
> reinitialize();
> }
> else
> {
>initialize();
>bInitialized = true;
> }
> 
> The problem now is how to write reinitialize() safely. I have no idea on which
> fields of LEXER or PARSER objects must be reset to which status. Then my
> current code works most of the time, but it encounters NULL pointer
> occasionally(I am sure the grammar file is good because if I don't reuse the
> LEXER and PARSER objects, everything goes fine).
> 
> Anybody could give me some ideas?
> 
> Thanks in advance.
> 
> --
> ===
> Regards
> Ronghui Yu
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> 
> 
> --
> ===
> Regards
> Ronghui Yu
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32239] Re: [antlr-interest] v2->v3 Skip wrapper and inside quotes in LITERAL of SQL // description.

2011-04-18 Thread Ruslan Zasukhin
On 4/18/11 8:25 PM, "Jim Idle"  wrote:

> ???


Okay, let me copy paste here SQL standard   :-)

 ::=
 [ ... ] 

 ::=
  | 

 ::= !! See the Syntax Rules.

 ::= 

so:

SQL standard escape in literal must looks as:
literal:'som''e literal'

Common escape with backslash \  (e.g. C++ Java)
literal:'som\'e literal'



Problem (for me :-) is how  to skip one of quotes INSIDE of literal,
using ANTLR v3 ...

In ANTLR v2 this was veery easy:

STRING_LITERAL
:QUOTE!<< WRAPPER quote... Easy enough in v3
(ESCAPE_SEQUENCE
|~('\'' | '\\')
|QUOTE QUOTE!<< can be inside of LITERAL many times
)* 
QUOTE!   << WRAPPER quote... Easy enough in v3


Just  three '!'  and task was solved in v2...

Yes, you (Jim) have to show effective solution for v3 (C) to remove
WRAPPER-quotes. 

But above rule for LITERAL is more hard. Because it can have quotes INSIDE.
You see problem?



Also I have check few SQL grammars from ANTLR site.
E.g. This is mySQL lexer.

TEXT_STRING:
  ('\'' 
  ( 
options{greedy=true;}: ~('\'' | '\r' | '\n' ) | '\'' '\''
  )* 
  '\'' )

Here author even do not care about QUOTE QUOTE.

So lexer will send to parser  Token ( 'aaa''bbb''ccc' )
But should sent   Token ( aaa'bbb'ccc )


Or I do smoke wrong staff???  :-)


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32237] [antlr-interest] v2->v3 Skip wrapper and inside quotes in LITERAL of SQL // C-target [SOLVED v3]

2011-04-18 Thread Ruslan Zasukhin
Hi Guys,

Below I copy paste my solution for LITERAL of our SQL grammar.

GOOD:

* all on LEXER level.
* uses effective way of GETCHARINDEX() +  EMIT() for most literals.
* only if was found QUOTE QUOTE  (rare case in life) then will be used
complex algorithm.

BAD:

* I don¹t know yet if it needs to free pTmpStr  manually.
* I don¹t know yet if this solution will work for UTF16 input of Lexer.
 
   * I have to use direct access to produced Token object to modify ITS text
copy. 

* I still think that solution is much more NOT trivial comparing to !
Of ANTLR v2
* solution is very target-oriented IMO.
 IMO: Ideal is ANTLR own syntax to control lexer¹s output

Anybody can give hints for better solution? Before offer ideas, please
carefully check 
STRING_LITERAL rule below:
**Inside** of STRING_LITERAL should be possible QUOTE QUOTE
and we should skip one of them.

Example:
 'aa¹¹bb¹¹cc''dd'   =>   aa¹bb¹cc¹dd


//-
// String literals:

fragment
LETTER   // caseSensitive = false, so we use only small chars.
:'a'..'z'
|   '@'
;

fragment
ESCAPE_SEQUENCE  // Escape for VSQL can be:  \'  \_  \%
:'\\' ( QUOTE | '_' | '%' )
;

STRING_LITERAL
@init
{
int dquotes_count = 0;
int theStart = $start;
}
:QUOTE{ theStart = GETCHARINDEX(); }
(ESCAPE_SEQUENCE
|~('\'' | '\\')
|QUOTE QUOTE{ ++dquotes_count; }
)* 
{ $start = theStart; EMIT(); }
QUOTE 
{
if( dquotes_count > 0 ) // ONLY if was found ''
{
pANTLR3_COMMON_TOKEN pToken = LEXSTATE->token;

pANTLR3_STRING pTmpStr = pToken->getText( pToken );
char* pStart = (char*) pTmpStr->chars;

while( dquotes_count-- ) // we make string smaller in the
same buffer.
{
char* pFirstQuote = strchr( pStart, '\'' );
   
if( *(pFirstQuote + 1) != '\'' ) // the second quote?
continue;
   
// Example: 'aa¹¹bb¹¹cc''dd'   =>   aa¹bb¹cc¹dd
int CharsOnLeft   = pFirstQuote - pStart + 1;
int CharsToMove = pTmpStr->len - CharsOnLeft;
   
ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );

// prepare for possible next loop:
pStart = pFirstQuote + 1;
pTmpStr->len--;
}

pToken->setText( pToken, pTmpStr );
}
}
;



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32229] Re: [antlr-interest] v2->v3 Skip chars in Lexer. DELIMITED IDENT [SOLVED v3] :-)

2011-04-17 Thread Ruslan Zasukhin
On 4/17/11 6:32 PM, "Jim Idle"  wrote:

Hi Jim,

Thanks to your point on
http://markmail.org/message/izyhuzbooerfw4tu


I was able resolve DELIMITED IDENT rule with (as I am sure now) maximal
effectiveness. Great.

30 min for me have take to correctly find C analogs of macros, and make them
compile. Then next 90 minutes I have search how to force IDENT type of token
instead of DELIMITED.

Now rule works correctly.  All on LEXER level.  All looks effective.

Again, if you like this code, maybe add it to FAQ page for future
developers.


//--
Next task is LITERAL :-)
So I will yet send letters here. Please be patience.


//--
IDENT
:( LETTER | '_' ) ( LETTER | '_' | DIGIT )*
;

DELIMITED// delimited_identifier
@init
{
$type = IDENT;
int theStart = $start;
}
:
(DQUOTE{ theStart = GETCHARINDEX(); }
( ~(DQUOTE) | DQUOTE DQUOTE )+
{ $start = theStart; EMIT(); }
DQUOTE

|BQUOTE{ theStart = GETCHARINDEX(); }
( ~(BQUOTE) | BQUOTE BQUOTE )+
{ $start = theStart; EMIT(); }
BQUOTE

// valentina/oracle extension: [asasas '' " sd "]
|LBRACK{ theStart = GETCHARINDEX(); }
( ~(']') )+
{ $start = theStart; EMIT(); }
    RBRACK
)
;


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32218] Re: [antlr-interest] v2->v3 Skip chars in Lexer? For C-target [SOLVED 2.5]

2011-04-17 Thread Ruslan Zasukhin
Hi All,

After Jim points to more effective way skip wrapper-quotes,
And some more time, this is working solution for archive:

//
IDENT
:( LETTER | '_' ) ( LETTER | '_' | DIGIT )*
;

// RZ 04/17/11: in ANTLR v3 there is no way skip chars in lexer. Oops.
//Instead we do trick suggest by Jim Idle on ANTLR list:
//  skip first/last chras of token on the parser level.
// 
DELIMITED// delimited_identifier
:
(DQUOTE ( ~(DQUOTE) | DQUOTE DQUOTE )+ DQUOTE
|BQUOTE ( ~(BQUOTE) | BQUOTE BQUOTE )+ BQUOTE
|LBRACK ( ~(']') )+ RBRACK
)
;


And on the parser level, we use Token and its pointers to ++ / --
Also type of Token is changed to IDENT with help of re-write.


//
identifier
:IDENT// regular_identifier

|d=DELIMITED // delimited_identifier
{
++$d->start;
--$d->stop;
}
-> ^( IDENT[$d.text->chars] )
;




Works... But ...
I am far not sure that this solution is really more effective, Jim.

Yes, on lexer level I have use   ->chars, and you say it is slower ...

But on parser level, except to fast ++ / -- operations, we need yet create
second token IDENT and copy all values from the first ...

Sizeof( ANTLR3_COMMON_TOKEN_struct)  is about 160-200 bytes.

So creation by new and copy about 150 bytes to skip TWO chars
not looks so cheap operation.  Also note that IDENTs usually 5-20 chars
only.  Much less of 200 bytes of that structure.


And may be my first solution with Lexer level was not so bad?

And I still have TODO:  skip chars inside of LITERAL on parser level ...
here we cannot do just ++ \ --



I do not see yet the whole picture how works lexer on low level in C.

Also I do not see yet any clean information about UTF encodings in C-target.
I am going ask about this in future letters.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32216] Re: [antlr-interest] v2->v3 Skip chars in Lexer? Terrence?

2011-04-17 Thread Ruslan Zasukhin
On 4/17/11 11:06 AM, "Ruslan Zasukhin" 
wrote:

>> but basically it is easy to strip
>> leading and trailing characters as the tokens carry pointers, so get the
>> start pointer, increment it, get the end point, decrement it, now
>> 
>> Do not use the built in $token.text->chars as this is slow and just for
>> convenience. 
 
>> The token holds a pointer to the start of the text in the
>> original input stream, which is greatly faster and you don¹t do anything
>> at all to the token until and if you use it.

>> You know the token type, so can handle it appropriately.

Hmmm,

I have take a look, and I do not see way in C-target access token in lexer
rule.

Do you mean that I should care about these pointers LATER, in parser?

Butt hen this again looks as not best solution...
Java developers will remove them in lexer,
C developers in parser?

Some kind of Zoo ...


Please help   :-)

And note, that I am C++ developer with 20 years of experience,
do all my best reading ANTLR WIKI and book,and examples,
and which did work with ANTLR v2 for 10 years ...
cannot resolve this *trivial* task in *the best way*
for v3 for about 14 hours now.

I wonder how other C developers was able resolve this problem?

And may be docs, faqs, examples can be improved in this direction?
Thank you, in advance :-)

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32215] Re: [antlr-interest] v2->v3 Skip chars in Lexer? Terrence?

2011-04-17 Thread Ruslan Zasukhin
On 4/16/11 9:27 PM, "Jim Idle"  wrote:

> It is for performance and has been talked about for 4 years, so we don't
> need to start it again.

> You know the token type, so
> can handle it appropriately. It is a trivial piece of code and you will
> want a generic method/function for getting the string anyway. It takes
> less time to implement it than to worry about ! not being there any more
> :-)

And since Sunday, I'd like to add my vision.   :-)


*  how easy was solution in v2.
IDENT:  DQUOTE!  something DQUOTE!

What could be easier? :-)


* also best here is that !  Did as for Java so for C
Now in v3, for each target developers must provide different code.
Good?  Nope.


* I did very expect that in v3 I will be able specify some re-write rule
like in parser. But no it not works.  And btw, it seems there is no any
mention of this in docs and book.

Attention: re-write syntax do not works in Lexer v3.


* yes, I understand I think, that main technical problem is that Token tend
to point  p1/p2  to input buffer string to avoid COPY.  Yes speed  is my own
favor feature of product :-)

But your Token already can  have COPY if needed, right?
Then for RARE case-sensitive , it will be great to automated way.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32214] Re: [antlr-interest] v2->v3 Skip chars in Lexer? Terrence?

2011-04-17 Thread Ruslan Zasukhin
On 4/16/11 9:27 PM, "Jim Idle"  wrote:

Hi Jim,

> It is for performance and has been talked about for 4 years, so we don't
> need to start it again.

Okay, but may be it is good idea to add code-example into that FAQ page
about this quotes?

http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461

There is no C Target example on this page.

>  If we implement ! then you have to build the
> string in to every token and copy it,

Not very clear but ok.

I have see in book it is possible to use labels in Lexer
IDENT:  q1=DQUOTE  something  q2=DQUOTE

But how this helps? In book is shown useless example
Action with  all labels
{ $q1, something.text $q2 }

I did think we can do some "re-write" in lexer, but nope
So what use of that is not clear.


> but basically it is easy to strip
> leading and trailing characters as the tokens carry pointers, so get the
> start pointer, increment it, get the end point, decrement it, now
> 
> Do not use the built in $token.text->chars as this is slow and just for
> convenience. 

> The token holds a pointer to the start of the text in the
> original input stream, which is greatly faster and you don¹t do anything
> at all to the token until and if you use it.

So I must check structure Token of C Target,
And I should find there two pointers start/end and correct them.

Ok clear, thank you, Jim.

> You know the token type, so can handle it appropriately.

Why I should care about type?

I should correct pointers at the end of lexer rule, right?

> It is a trivial piece of code and you will
> want a generic method/function for getting the string anyway. It takes
> less time to implement it than to worry about ! not being there any more
> :-)

Piece of code may be trivial, but it takes hours to lean your C code.
And this is where is problem IMO.

This is why again I ask you to add best of the best example into that FAQ
page. It should take 5 minutes only from you. And will help others.


Problem2:  
you describe above effective solution only for skip FIRST/LAST quotes.
Good.   But you could see that we need yet remove INTERNAL quote
and this task require creation of COPY of string from original input.
Right?

STRING_LITERAL
@init
{
int dquotes_count = 0;
}
:QUOTE 
(ESCAPE_SEQUENCE
|~('\'' | '\\')
|QUOTE QUOTE{ ++dquotes_count; }
)* 
QUOTE 

{
// Remove the first and the last chars:
pANTLR3_STRING pQuotedStr = GETTEXT();
pANTLR3_STRING pStr = pQuotedStr->subString( pQuotedStr, 1,
pQuotedStr->len - 1 );

char* pStart = (char*) pStr->chars;

while( dquotes_count-- )
{
char* pFirstQuote = strchr( pStart, '\'' );

if( *(pFirstQuote + 1) != '\'' ) // second quote?
continue;
   
// Example: 'aabbcc''def'
int CharsOnLeft = pFirstQuote - pStart + 1;
int CharsToMove = pStr->len - CharsOnLeft;
   
ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );

// prepare for possible next loop:
    pStart = pFirstQuote + 1;
pStr->len--;
}

SETTEXT( pStr );
}
;





-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32212] Re: [antlr-interest] v2->v3 Skip chars in Lexer? For C-target [SOLVED]

2011-04-16 Thread Ruslan Zasukhin
On 4/16/11 1:18 PM, "Bart Kiers"  wrote:

Hi All,

Just for archive  I will show solution I was able built so far for our
Valentina SQL  couple of LEXER rules.

The only not clear yet to me is:
if I must destroy temporary strings to avoid leaks?

Also I still wonder, if exists more compact and elegant and effective
solution
>From point of view of C ­ developer? :-)


//--

// an identifier. Note that testLiterals is set to true!  This means
// that after we match the rule, we look in the literals table to see
// if it's a literal or really an identifier
IDENT
:( LETTER | '_' ) ( LETTER | '_' | DIGIT )*
;

DELIMITED   // delimited_identifier
:
(DQUOTE ( ~(DQUOTE) | DQUOTE DQUOTE )+ DQUOTE
|BQUOTE ( ~(BQUOTE) | BQUOTE BQUOTE )+ BQUOTE

|LBRACK ( ~(']') )+ RBRACK // valentina extension   [asasas '' "
sd "]
)
{
// Remove the first and the last chars:
pANTLR3_STRING pQuotedStr = GETTEXT();
pANTLR3_STRING pStr = pQuotedStr->subString( pQuotedStr, 1,
pQuotedStr->len - 1 );

SETTEXT( pStr );
}
{ $type = IDENT; }
;


And this is the second rule, more complex, because can be quotes inside:

//--

STRING_LITERAL
@init
{
int dquotes_count = 0;
}
:QUOTE 
(ESCAPE_SEQUENCE
|~('\'' | '\\')
|QUOTE QUOTE{ ++dquotes_count; }
)* 
QUOTE 

{
// Remove the first and the last chars:
pANTLR3_STRING pQuotedStr = GETTEXT();
pANTLR3_STRING pStr = pQuotedStr->subString( pQuotedStr, 1,
pQuotedStr->len - 1 );

char* pStart = (char*) pStr->chars;

while( dquotes_count-- )
{
char* pFirstQuote = strchr( pStart, '\'' );

if( *(pFirstQuote + 1) != '\'' ) // second quote?
continue;
   
// Example: 'aabbcc''def'
int CharsOnLeft = pFirstQuote - pStart + 1;
int CharsToMove = pStr->len - CharsOnLeft;
   
ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );

// prepare for possible next loop:
    pStart = pFirstQuote + 1;
pStr->len--;
}

SETTEXT( pStr );
}
;



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32211] Re: [antlr-interest] v2->v3 Skip chars in Lexer? // FAQ only for Java. What about C? :-)

2011-04-16 Thread Ruslan Zasukhin
On 4/16/11 1:18 PM, "Bart Kiers"  wrote:

I have found this FAQ
http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461

But only Java here ...

{setText(getText().substring(1, getText().length()-1));} ;


Now I struggling to find any API of C analogs of
getText() / setText() for LEXER.



> How to remove that quotes in v3?  :-)
> 
> 
> Here's a way:
> 
> DELIMITED
>   @init {
>     String q = null;
>   }
>   @after {
>     String text = getText();
>     // remove the first and last quote, replace all 2 quotes with a single
> quote 
>     setText(text.substring(1, text.length()-1).replace(q+q, q));
>   }
>   :  ( DQUOTE (~DQUOTE | DQUOTE DQUOTE)+ DQUOTE {q = $DQUOTE.text;}
>      | BQUOTE (~BQUOTE | BQUOTE BQUOTE)+ BQUOTE {q = $BQUOTE.text;}
>      ) { $type = IDENT; }
>   ;
> 
> or create your own token that handles the replacements
> internally: http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32210] Re: [antlr-interest] v2->v3 Skip chars in Lexer? Terrence?

2011-04-16 Thread Ruslan Zasukhin
On 4/16/11 1:18 PM, "Bart Kiers"  wrote:

Thank you, Bart.

And I have forget to mention that we using C-target.
So there is no nice string classes ...

And even with string classes, don't you think guys, that this NEW WAY
Of dealing of wrap-quotes do not looks  best of the best?

We must now work with strings?
remove first/last chars?
remove inside chars?
this means do not required copy pasts ...

I have hear that ANTLR3 LEXER going to be much faster of v2 Lexer.
We did use FLEX instead of v2 Lexer because of that.

And now ... We must do manual job with strings ??

Somehow not best of the best?
May be it is good idea RETURN BACK
that simple way to skip that chars right in lexer?


> On Sat, Apr 16, 2011 at 12:06 PM, Ruslan Zasukhin
>  wrote:
> ...
> 
> How to remove that quotes in v3?  :-)
> 
> 
> Here's a way:
> 
> DELIMITED
>   @init {
>     String q = null;
>   }
>   @after {
>     String text = getText();
>     // remove the first and last quote, replace all 2 quotes with a single
> quote 
>     setText(text.substring(1, text.length()-1).replace(q+q, q));
>   }
>   :  ( DQUOTE (~DQUOTE | DQUOTE DQUOTE)+ DQUOTE {q = $DQUOTE.text;}
>      | BQUOTE (~BQUOTE | BQUOTE BQUOTE)+ BQUOTE {q = $BQUOTE.text;}
>      ) { $type = IDENT; }
>   ;
> 
> or create your own token that handles the replacements
> internally: http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844
> 
> Regards,
> 
> Bart. 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32208] [antlr-interest] v2->v3 Skip chars in Lexer?

2011-04-16 Thread Ruslan Zasukhin
Hi All,

In our old v2 grammar in Lexer we did have

DELIMITED// delimited_identifier
:
(DQUOTE!  ( ~(DQUOTE) | DQUOTE DQUOTE! )+ DQUOTE!
|BQUOTE!  ( ~(BQUOTE)  | BQUOTE BQUOTE! )+ BQUOTE!
)
{ $type = IDENT; }
;


I.e. We have skip quotes around IDENT, and  in case of ""  inside we have
skiped second.

I have spent few hours to try found analog for v3 and no success.


All v3 grammars I have check, DO NOT ignore any symbols.
Why?


How to remove that quotes in v3?  :-)


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31915] Re: [antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

2011-03-23 Thread Ruslan Zasukhin
On 3/23/11 4:37 PM, "Kevin J. Cummings"  wrote:

>>> You can do it the same way in v3, but when you generate the code, ANTLR
>>> will make up token names and you won't be able to write a good error
>>> display routine/handler because you won't know the tokens. You will just
>>> have T23 or something like that, and then you can't do anything
>>> interesting. So, you don't HAVE to, but it is neater when you come to do
>>> certain things.
>> 
>> Aha, I see.
>> Yes we have see that T23 a lots in generated code ..
>> 
>> 
>> Interesting ... In v2 names of tokens was shown correctly
>> Right?
> 
> IIRC, the names of tokens literals in V2 had a "Literal_" prefix in
> their names

Yes, in C++ code ... Yes

But in the error messages was quite correct names of tokens ... It seems
For example 
expected "from"



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31908] Re: [antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

2011-03-23 Thread Ruslan Zasukhin
On 3/23/11 1:54 AM, "Jim Idle"  wrote:

Hi Jim,

> You can do it the same way in v3, but when you generate the code, ANTLR
> will make up token names and you won't be able to write a good error
> display routine/handler because you won't know the tokens. You will just
> have T23 or something like that, and then you can't do anything
> interesting. So, you don't HAVE to, but it is neater when you come to do
> certain things.

Aha, I see.
Yes we have see that T23 a lots in generated code ..


Interesting ... In v2 names of tokens was shown correctly
Right?


In v3 probably was some changes in architecture,
Which prevent from this nice automatic feature ..


Hmm. Actually it is not good, when a newer version loose nice features.
:-)

Again, thank you very much Jim for points.



==
Before click SEND button have come strange [and OPTIONAL] question to mind.

If v3 was designed to support old AST operators ^ !
And new re-write rules (ok for new)

But why was trashed feature of  ## = (IMAGE_TOKEN, ##) ?
picture is broken.
two operators self cannot resolve all tasks ..


If it was not touched, then porting was much easier I think.

Or I do not see big idea behind of this step?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31893] Re: [antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

2011-03-22 Thread Ruslan Zasukhin
On 3/22/11 11:27 PM, "Jim Idle"  wrote:

>>> However, using lower case literals in your parser directly is not a
>>> good idea.  Use real tokens so that you error messages are better
>> 
>> Simple example, please?
> 
> Instead of:
> 
> rule : 'join' somerule;
> 
> Use:
> 
> rule : JOIN somerule;
> 
> // Lexer rule to match:
> //
> JOIN : 'join';


Clear.

But this is exactly what was NOT needed in ANTLR v2.
And ANTLR was proud that we can write tokens directly in grammar.

Collecting all this tokens into Lexer is additional manual work. No?
Similar to  Lexer :)


So for me not clear why this is better now for v3?




==
> And for case insensitivity I specify the token specs all in UPPPER rather
> than lower and then override the input stream as per:
> 
> http://www.antlr.org/wiki/pages/viewpage.action?pageId=1782
> 
> Although someone has added instructions for generating the slowest case
> insensitive lexers in the world with individual letter rules. Use the
> input stream override method in general.

Yes, we have see this page ...

> Support for case insenstive matching is built in to the C target input
> streams. To use it, you must make a method call before using the input stream
> as in the example below and specify all your keyword/lexer tokens in UPPER
> CASE only

JOIN above  is that  UPPER CASE  example?

I ask because   'join'  above still is in low case ...



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31890] Re: [antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

2011-03-22 Thread Ruslan Zasukhin
On 3/22/11 8:59 PM, "Jim Idle"  wrote:

Hi Jim,

Thank you for answer.

> That is telling you that err, you can't use rewrite syntax AND an operator
> ;). Which one is it supposed to use?
> 
> So, remove any ^ and ! operators from the rule and use rewrite rules only.

Yes, this is clear now :)

> However, is that what you want to rewrite it as. I think you are using
> query_expression as that was what it looked like in v2. You might be
> better off abstracting in to two rules:
> 
> queryExpression
> : unionExpressions -> ^(QUERY_EXPRESSION unionExpressions) ;
> 
> unionExpressions
>  : query_term (( "union"^ | "except"^ ) "all"? query_term)* ;

So really 2 rules should be used as was in my first letter.
Okay, then we will make such changes in our grammar from v2

> However, using lower case literals in your parser directly is not a good
> idea.  Use real tokens so that you error messages are better

Simple example, please?


> and remember
> that SQL is generally case insensitive so you will need a [trivial] custom
> input stream.

Of course we do remember this :)

And after grammar start to breath, we will yet work on
* case-insensitive of SQL text
* UTF-16 for input  -- clarify ..



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31883] Re: [antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

2011-03-22 Thread Ruslan Zasukhin
On 3/22/11 8:11 PM, "Ruslan Zasukhin" 
wrote:

IF to remove  ^  ( "union" | "except" )^

query_expression
     :    query_term (( "union" | "except" ) "all"? query_term)*
  -> ^(QUERY_EXPRESSION $query_expression)

 then all looks correct, but tree is empty from $query_expression

TREE: (SQL_STATEMENT QUERY_EXPRESSION)

But it should looks as

TREE: (SQL_STATEMENT (QUERY_EXPRESSION (select (SELECT_LIST
(SELECT_ELEM_LIST ) (from (NON_JOIN_TABLE t1)


> On 3/20/11 1:11 PM, "Matt Fowles"  wrote:
> 
> Hi Matt,
> 
>> Ruslan,
> 
>> Try:
>> 
>> query_expression
>>     :    query_term (( "union" | "except" )^ "all"? query_term)*
>>  -> ^(QUERY_EXPRESSION $query_expression)
>>     ;
> 
> Well,  $  not helps.   Still same
>error 165 uses rewrite syntax and also an ast operator
> 
> As I understand, 
> ->   is re-write syntax
> ^is AST operator ...
> 
> 
>  
>> Matt
>> 
>> On Sun, Mar 20, 2011 at 10:40 AM, Ruslan Zasukhin
>>  wrote:
>>> Hi All,
>>> 
>>> In v2 grammar we have rule as
>>> 
>>> ===
>>> query_expression
>>>    :    query_term (( "union"^ | "except"^ ) ( "all" )? query_term)*
>>>        {    ## = #([QUERY_EXPRESSION,"QUERY_EXPRESSION"], ##);    }
>>>    ;
>>> ===
>>> 
>>> 
>>> We try change it to v3
>>> 
>>> ===
>>> query_expression
>>>    :    query_term (( "union"^ | "except"^ ) ( "all" )? query_term)*
>>>            ->(QUERY_EXPRESSION   )
>>>    ;
>>> ===
>>> 
>>> Ops, we cannot specify top node, because it can be  union OR except.
>>> 
>>> 
>>> For now the only way we have found is:
>>> ===
>>> query_expression
>>>    :    query_expression2  ->(QUERY_EXPRESSION  query_expression2)
>>>    ;
>>> 
>>> 
>>> query_expression2
>>>    :    query_term (( "union"^ | "except"^ ) ( "all" )? query_term)*
>>>    ;
>>> ===
>>> 
>>> 
>>> 
>>> Question is. May be exists more elegant way for v3
>>> Without additional rule?
> 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31879] Re: [antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

2011-03-22 Thread Ruslan Zasukhin
On 3/20/11 1:11 PM, "Matt Fowles"  wrote:

Hi Matt,

> Ruslan,

> Try:
> 
> query_expression
>     :    query_term (( "union" | "except" )^ "all"? query_term)*
>  -> ^(QUERY_EXPRESSION $query_expression)
>     ;

Well,  $  not helps.   Still same
   error 165 uses rewrite syntax and also an ast operator

As I understand, 
->   is re-write syntax
^is AST operator ...


 
> Matt
> 
> On Sun, Mar 20, 2011 at 10:40 AM, Ruslan Zasukhin
>  wrote:
>> Hi All,
>> 
>> In v2 grammar we have rule as
>> 
>> ===
>> query_expression
>>    :    query_term (( "union"^ | "except"^ ) ( "all" )? query_term)*
>>        {    ## = #([QUERY_EXPRESSION,"QUERY_EXPRESSION"], ##);    }
>>    ;
>> ===
>> 
>> 
>> We try change it to v3
>> 
>> ===
>> query_expression
>>    :    query_term (( "union"^ | "except"^ ) ( "all" )? query_term)*
>>            ->(QUERY_EXPRESSION   )
>>    ;
>> ===
>> 
>> Ops, we cannot specify top node, because it can be  union OR except.
>> 
>> 
>> For now the only way we have found is:
>> ===
>> query_expression
>>    :    query_expression2  ->(QUERY_EXPRESSION  query_expression2)
>>    ;
>> 
>> 
>> query_expression2
>>    :    query_term (( "union"^ | "except"^ ) ( "all" )? query_term)*
>>    ;
>> ===
>> 
>> 
>> 
>> Question is. May be exists more elegant way for v3
>> Without additional rule?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31272] [antlr-interest] Feature for ANTLR Works

2011-01-29 Thread Ruslan Zasukhin
Hi Jean,

Let user have type a rule as

Constant
:   type expression
;

And there is no yet rules type and expression.

Now user need type 

type
:
;


expression
:
;


May be its possible add auto-generation of such missing yet rules on demand?

But question is where to insert them? In big grammar, I do not want scroll
yet to that new place, and if not scroll then how to find them fast ...

So may be better generate them into clipboard on e.g.
  cmd + option + click  on such missing yet rule.

Then I can jump into place I want in grammar and do PASTE  cmd + v




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29011] Re: [antlr-interest] ANTLR v4 progress // switch fro v2.7.2

2010-05-26 Thread Ruslan Zasukhin
On 27/5/10 2:18 AM, "Terence Parr"  wrote:

Hi Terrence,

> Just passing along an example HTML subset lexer/parser using ANTLR v4; thanks
> to  debugging and moral support from Oliver Zeigermann, we got the code
> generation and runtime support working sufficiently to use the following
> grammars.   generate some really nice code indeed. You will note that, except
> for the enhancement of the lexer modes, the grammars are backward compatible
> with v3 :)

Congratulation on progress. :)

My question is. Currently we still use ANTLR 2.7.2.
Was not time jump to v3 yet.

What will be your advice:
jump in nearest months to v3, and later this will move smooth
to v4 also. Or just wait for v4 to avoid too big transfers?



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.