So this regexp is used to create a token for c : http://src.opensolaris.org/source/xref/opengrok/trunk/src/org/opensolaris/opengrok/analysis/c/CSymbolTokenizer.lex#56
first part of it will match x second part will match 0x216000000 from my testing searching for x2160* should get you something ... it seems the first 0 is getting ignored ... almost as if jflex is merging those two so and identifier must always begin with some letter (which seems to be correct according to http://jflex.de/manual.pdf section 4.2.12 and then section 4.3.1 where the whitespace is being ignored ! ) if you think it's a bug and if you log it we'll try to get a look alt. since you got a pointer to the code, you can try to figure it out yourself and send us a patch. A good checker for generated lucene db is luke http://www.getopt.org/luke/ from my point of view, one could add a hex(dec,...) number recognizer into the analyzer (same could be used in java I guess) also it seems that section 4.3.1 has some nice examples, which could improve the analyzers opengrok uses ... mmmmmh ... -- L Jim R. Wilson wrote: > I see - in that case, I'm not sure what the desired behavior is. > Maybe one of the developers on the list can explain (I'm just a fellow > enthusiast). Good luck! > > -- Jim > > On Thu, Sep 10, 2009 at 12:26 PM, Zach Carter <z.carter at f5.com> wrote: > >> On Thursday 10 September 2009 09:21:45 Jim R. Wilson wrote: >> >>> Does 2610* match anything? >>> >> Yes, it matches a lot of things, I get 248 valid hits in my source tree, but >> it doesn't match 0x216000000. >> >> -Zach >> > _______________________________________________ > opengrok-discuss mailing list > opengrok-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/opengrok-discuss >