RE: Extracting tokens from an expression and matching an object against that expression without parsing twice

Davide Vecchi Mon, 17 Nov 2014 00:48:00 -0800

Thanks for your inputs.

I'm probably showing my technological age here, but I certainly admit that I 
have this tendency to avoid repeating complex operations as a matter of 
principle when it's known in advance that the second process will produce 
exactly the same result as the first one. When I catch myself doing that I 
always feel that my design is not OK.

However in this case I am quite sure I need to get rid of the double parsing, 
although I did not demonstrate in a particularly strict way that that's the 
cause of the slowdown. It's more like a qualified (in my opinion) guess, 
reinforced by the fact that method Expression.fromString(String) has a TODO 
saying "TODO: cache expression strings, since this operation is pretty slow" 
(I'm using version 3.0.2). So it looks like the Cayenne coders too had reasons 
to worry to some extent about optimization in this area.

I just used JVisualVM to profile the execution and two of the methods where by 
far most of the time is spent are Expression.fromString(String) and 
ExpressionParser.getNextToken() . Since I have to cut down the processing time 
I do have to focus on them first.

The situation here is that I modified a preexisting application which was doing 
some basic parsing, and after creating the tokens from the parsing it was using 
them to match the expression against objects. That parsing is basic in that it 
can only parse simple expressions, f.ex. it doesn't support parentheses 
grouping.

My changes consisted of removing that parsing code from the application and 
replacing it with calls to Cayenne, because we need real parsing. Of course the 
parsing done by Cayenne is way more powerful and that might be the real and 
fair reason why it takes longer, but even if this is the case it's important 
for me not to do that parsing twice.

It's not easy to explain properly why I need the tokens; the general reason is 
that the preexisting application, written long ago by several other persons, is 
designed to use them, and changing its design would be too big an undertaking. 
Since all that needs to be improved is the parsing and matching I thought I'd 
just use a powerful tool to replace only those parts.

I will see if I can use Andrus' pointers to extract the tokens from the 
Expression instance.

-----Original Message-----
From: Andrus Adamchik [mailto:[email protected]] 
Sent: Sunday, November 16, 2014 14:57
To: [email protected]
Subject: Re: Extracting tokens from an expression and matching an object 
against that expression without parsing twice

I second John's assessment. 

BTW, what are the tokens for? Do you actually need to have access to the 
lexical structure of the String? As of course parsed Expression object is a 
tree itself and gives you access to its own structure either directly 
('getOperand(int)') or via 'traverse' and 'transform' methods.

Andrus

> On Nov 14, 2014, at 9:54 PM, John Huss <[email protected]> wrote:
> 
> This looks like a serious micro optimization.  Is the performance for 
> this really that critical?  Have you demonstrated that this is your 
> application's crucial hot spot?
> 
> On Fri, Nov 14, 2014 at 7:35 AM, Davide Vecchi <[email protected]> wrote:
> 
>> Hi all,
>> 
>> I have an expression in a string, and I use Cayenne to parse the 
>> expression into tokens, which are needed for a specific purpose.
>> 
>> However in addition to having the tokens I also need to evaluate an 
>> object against that expression, to see if that object matches the expression.
>> 
>> My problem is that the way I'm doing it causes the parsing to be done 
>> twice on the same expression, and I would like to avoid to parse the 
>> same expression twice.
>> 
>> The token creation I'm doing it like this:
>> 
>> -----------------------------------
>> String where = "myField=0";
>> 
>> Reader reader = new StringReader(where);
>> 
>> ExpressionParser parser = new ExpressionParser(reader);
>> 
>> List<Token> tokens = new ArrayList<>();
>> 
>> Token token = parser.getNextToken();
>> 
>> while (token != null) {
>> 
>>     tokens.add(token);
>> 
>>     token = parser.getNextToken();
>> }
>> -----------------------------------
>> 
>> The object matching I'm doing it like this:
>> 
>> -----------------------------------
>> String where = "myField=0";
>> 
>> Expression expression = Expression.fromString(where);
>> 
>> boolean matches = expression.match(object);
>> -----------------------------------
>> 
>> The call to Expression.fromString made in the object matching 
>> operation performs a parsing, but the parsing of the same expression 
>> had already been done in the token creation operation.
>> 
>> Is there a way to redesign this process in order to get the tokens 
>> and also match an object against the expression without parsing the 
>> same expression twice ?
>> 
>> For example, I believe that the call to Expression.fromString must 
>> have created the tokens, because it has parsed the string. So I 
>> thought I could reverse the order and do the object matching first, 
>> keep the Expression instance created in that process and use it to 
>> extract the tokens. But I can't see how to extract the tokens from an 
>> Expression instance instead of from an ExpressionParser instance as I'm 
>> currently doing.
>> 
>> Or another possibility could be that I keep creating the tokens 
>> first, and then I match my object against them, instead of against 
>> the string expression that generated those tokens. But I can't see 
>> how to match an object against tokens.
>> 
>> So I'm looking for some ideas.
>> 
>> Thanks in advance.
>> 
>> Davide Vecchi
>>

RE: Extracting tokens from an expression and matching an object against that expression without parsing twice

Reply via email to