So this regexp is used to create a token for c :
http://src.opensolaris.org/source/xref/opengrok/trunk/src/org/opensolaris/opengrok/analysis/c/CSymbolTokenizer.lex#56

first part of it will match   x
second part will match   0x216000000

from my testing searching for x2160*  should get you something ... it 
seems the first 0 is getting ignored ...
almost as if jflex is merging those two so and identifier must always 
begin with some letter
(which seems to be correct according to http://jflex.de/manual.pdf  
section 4.2.12 and then section 4.3.1 where the whitespace is being 
ignored ! )

if you think it's a bug and if you log it we'll try to get a look
alt. since you got a pointer to the code, you can try to figure it out 
yourself and send us a patch.
A good checker for generated lucene db is luke http://www.getopt.org/luke/

from my point of view, one could add a hex(dec,...) number recognizer 
into the analyzer (same could be used in java I guess)
also it seems that section 4.3.1 has some nice examples, which could 
improve the analyzers opengrok uses ... mmmmmh ...

--
L

Jim R. Wilson wrote:
> I see - in that case, I'm not sure what the desired behavior is.
> Maybe one of the developers on the list can explain (I'm just a fellow
> enthusiast).  Good luck!
>
> -- Jim
>
> On Thu, Sep 10, 2009 at 12:26 PM, Zach Carter <z.carter at f5.com> wrote:
>   
>> On Thursday 10 September 2009 09:21:45 Jim R. Wilson wrote:
>>     
>>> Does 2610* match anything?
>>>       
>> Yes, it matches a lot of things, I get 248 valid hits in my source tree, but
>> it doesn't match 0x216000000.
>>
>> -Zach
>>     
> _______________________________________________
> opengrok-discuss mailing list
> opengrok-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/opengrok-discuss
>   

Reply via email to