Re: Tokenizing problem with numbers in query

2010-01-07 Thread Bernd Brod
Hi,


  Did you re-start tomcat and re-index your collection?

Yes


 Do you want to search inside alpanumeric strings? Or you are interested
 only prefix queries. Can you give us more examples like target documents and
 queries.


Searching inside would be required, yes. If the above example would work I
would already be glad.

Bernd


Re: Tokenizing problem with numbers in query

2010-01-05 Thread Bernd Brod
Thanks to both of you for the quick answers,

analysis.jsp shows that the WordDelimiterFilterFactory is performing the
split

I was experimenting around with the delimiters for the last two days but am
still unable to obtain the desired result.

I tried entirely kicking solr.WordDelimiterFilterFactory from both query
and text resulting in effictively crippling the search, I got nearly no
results for anything.  Removing it only from query also would not show up
the target document.


The target document looks like this:

bla /asdf5qwertz500ddd


Searching for /asdf5qwertz (also with tailing wildcard, with or without
the leading slash) wont show up the document. It also wont get highlighted
in the analysis.jsp

I tried setting splitOnNumerics to 0 (no change) as well as changing
generateNumberParts to 0 - the query is still being split at the number.

Any suggestions?

Bernd



On Sun, Jan 3, 2010 at 6:27 PM, Erick Erickson erickerick...@gmail.comwrote:

 This is an *extremely* useful page for figuring out what various
 tokenizers/filters are doing. The javadocs for the classes
 referenced can also provide some additional details

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 Erick

 On Sun, Jan 3, 2010 at 11:26 AM, Bernd Brod bernd.b...@gmail.com wrote:

  Hello,
 
  when searching for a string: asdf5qwerty solr will tokenize it to:
  asdf, 5, qwerty and display documents matching either string.
 
  How can i stop this behaviour and make it just search for plain
  asdf5qwerty?
 
  thanks in advance.
  Bernd
 



Re: Tokenizing problem with numbers in query

2010-01-05 Thread Bernd Brod
Hi,

On Tue, Jan 5, 2010 at 5:17 PM, Erick Erickson erickerick...@gmail.comwrote:

 We need to back up, this is looking like an XY problem. That is,
 you're asking for specifics when what would probably be more
 helpful is for you to describe *what* the problem you're trying
 to solve is rather than *how* to make a specific behavior
 happen. Although re-reading your original e-mail does give a
 clue G

 If, for instance, you really really want the string indexed and searched
 literally (if, for instance, it's a part number), you want to use something
 like WhitespaceTokenizerFactory, perhaps lowercasing too, rather
 than fiddle around with KeywordTokenizerFactory. If you want some
 other behavior, please explain it in more detail G...


I am indexing files that also include traffic captures (so there can be
pretty much anything inside). When looking for a long alphanumeric string I
would have expected to have fewer results than when searching with a short
one. But through of all the tokenizing it returns more (useless) results.
This is very disappointing because i could find these documents with grep
easily. Whats even more disappointing: disabling the
WordDelimiterFilterFactory (for query and/or text) will just result in 0
hits on my document. Im not quite sure what to do.

Ideally I would like to be able to search for strings as a1a1a1a1a1a1a1 that
would not match against single a and / or 1.

Bernd


Tokenizing problem with numbers in query

2010-01-03 Thread Bernd Brod
Hello,

when searching for a string: asdf5qwerty solr will tokenize it to:
asdf, 5, qwerty and display documents matching either string.

How can i stop this behaviour and make it just search for plain
asdf5qwerty?

thanks in advance.
Bernd