Hi Jim, I have spent 2 days running around this, and now I am ready describe what I see, to get your help, and it seems exists bug/leaks in reuse() area. Or I not correctly use it, but I do as you described in single letter 3 months ago.
So ... Long story :-) * I have simple bench that do 100K INSERT commands. v2 parser do this in 19 seconds. v3 parser no reuse do this in 24 seconds. OF COURSE we must expect speedup if to reuse lexer/parser. So I have design code to be able easy switch between these 2 ways. And when I try go with reuse I get comparable speed by 2GB of RAM eaten. ===================================== * Using Apple XCODE 4.2 Instruments, I see what is going on. this is not leaks actually, just parser always allocate and allocate ANTLR_STRING objects, in parser and tree-parser rules which use $c.text ===================================== FOR EXAMPLE: * I did have in the parser rule: hex_string_literal : s = HEX_NUMBER -> CONST_STR_HEX[$s.text->chars] ; ZERO my own code here. Right? And I see that $s.text in C code expanded to getText() allocates and allocates ... So it is never reused as I understand. ===================================== BTW When I have to see that get_Text() is used, and I remember you told avoid this, I have jump to sources and have come to idea: why here to create new token, I need getText() ?? May be I can just change token type as the following: hex_string_literal : s = HEX_NUMBER { $s->setType( $s, CONST_STR_HEX ); } ; And it seems this works fine.... I have correct few rules in such way in the parser.... But Tree Parser still have for example this: general_literal returns [ENode_Const_Ptr res] : cd=CONST_DATE {res=make_enode_date ( GET_FBL_STRING( $cd.text) );} | ct=CONST_TIME {res=make_enode_time ( GET_FBL_STRING( $ct.text) );} | s=const_str {res=make_enode_str ( GET_FBL_STRING( $s.text ) );} ; All these $c.text calls getText() -- this makes COPY of string buffer, Then I convert into our own FBL_String... PROBLEM 1: this ANTLR STRINGs produced by get_Text() never are reused as I see. PROBLEM 2: related to speed also ‹ how we can avoid here make copy of string? in sources I see that exists code as return ((pANTLR3_COMMON_TREE)(tree->super))->token->getText( ((pANTLR3_COMMON_TREE)(tree->super))->token); May be something can be optimized/hacked here? For example may be I can write own func, which check what token have char* or ANTLR_String, and choose way ... But what syntax come to token in the .g? I can do own macro of course ... Just I want get some feedback if this can be a good idea for all? ===================================== And this is how I try reuse Lexer/Parser and NOT TreeParser. All follow to your letter Jim: void SqlParser_v3::ResuseParserObjects( const char* inTextToParse, vuint32 inLength ) { // ------------------------------- // TREE PARSER cannot be reused. Destroy it. // if( mpTreeParser ) { mpTreeParser->free( mpTreeParser ); mpTreeParser = NULL; } if( mpNodes ) { mpNodes->free( mpNodes ); mpNodes = NULL; } // ------------------------------- // Reuse other objects // mpInput->reuse( mpInput, (pANTLR3_UINT8) inTextToParse, (ANTLR3_UINT32) inLength, (pANTLR3_UINT8) "VSQL" ); mpTokenStream->reset( mpTokenStream ); mpLexer ->reset( mpLexer ); mpParser ->reset( mpParser ); ResetOwnData( mpParser ); } -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.