Re: Which Tokeniser (and/or filter)

2012-02-09 Thread Erick Erickson
@lucene.apache.org Subject: Re: Which Tokeniser (and/or filter) Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST) : This all seems a bit too much work for such a real-world scenario? You haven't really told us what your scenerio is. You said you want to split tokens on whitespace, full-stop (aka: period

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Rob Brown
- From: Chris Hostetter hossman_luc...@fucit.org Reply-to: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Which Tokeniser (and/or filter) Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST) : This all seems a bit too much work for such a real-world scenario? You haven't really told

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Erick Erickson
://www.intelcompute.com -Original Message- From: Chris Hostetter hossman_luc...@fucit.org Reply-to: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Which Tokeniser (and/or filter) Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST) : This all seems a bit too much work for such a real

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Robert Brown
., asp.net, .net, net. Cheers, Rob -- IntelCompute Web Design and Online Marketing http://www.intelcompute.com -Original Message- From: Chris Hostetter hossman_luc...@fucit.org Reply-to: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Which Tokeniser

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Erick Erickson
@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Which Tokeniser (and/or filter) Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST) : This all seems a bit too much work for such a real-world scenario? You haven't really told us what your scenerio is. You said you want to split tokens

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Robert Brown
...@fucit.org Reply-to: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Which Tokeniser (and/or filter) Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST) : This all seems a bit too much work for such a real-world scenario? You haven't really told us what your scenerio

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Robert Brown
I'm still finding matches across newlines index... i am fluent german racing search... fluent german Any suggestions? I've currently got this in wdftypes.txt for WordDelimiterfilterfactory \u000A = ALPHANUM \u000B = ALPHANUM \u000C = ALPHANUM \u000D = ALPHANUM # \u000D\u000A = ALPHA

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Ahmet Arslan
I'm still finding matches across newlines index... i am fluent german racing search... fluent german Any suggestions?  You can use a multiValued field for this. Split your document according to new line at client side. arri am fluent/arr arrgerman racing/arr

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Robert Brown
This all seems a bit too much work for such a real-world scenario? --- IntelCompute Web Design Local Online Marketing http://www.intelcompute.com On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan iori...@yahoo.com wrote: I'm still finding matches across newlines index... i am

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Erick Erickson
Well, this is a common approach. Someone has to split up the input as sentences (whatever they are). Putting them in multi-valued fields is trivial. Then you confine things to within sentences, then you start searching phrases with a slop less than your incrementGap... Best Erick On Tue, Feb 7,

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Erik Hatcher
A custom tokenizer/tokenfilter could set the position increment when a newline comes through as well. Erik On Feb 7, 2012, at 15:28, Erick Erickson erickerick...@gmail.com wrote: Well, this is a common approach. Someone has to split up the input as sentences (whatever they are). Putting

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Chris Hostetter
: This all seems a bit too much work for such a real-world scenario? You haven't really told us what your scenerio is. You said you want to split tokens on whitespace, full-stop (aka: period) and comma only, but then in response to some suggestions you added comments other things that you

Which Tokeniser (and/or filter)

2012-02-06 Thread Robert Brown
Hi, I need to tokenise on whitespace, full-stop, and comma ONLY. Currently using solr.WhitespaceTokenizerFactory with WordDelimiterFilterFactory but this is also splitting on , /, new-line, etc. It seems such a simple setup, what am I doing wrong? what do you use for such normal

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Ahmet Arslan
I need to tokenise on whitespace, full-stop, and comma ONLY. Currently using solr.WhitespaceTokenizerFactory with WordDelimiterFilterFactory but this is also splitting on , /, new-line, etc. WDF is customizable via types=wdftypes.txt parameter.

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Robert Brown
My fear is what will then happen with highlighting if I use re-mapping? On Mon, 6 Feb 2012 03:33:03 -0800 (PST), Ahmet Arslan iori...@yahoo.com wrote: I need to tokenise on whitespace, full-stop, and comma ONLY. Currently using solr.WhitespaceTokenizerFactory with

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Ahmet Arslan
My fear is what will then happen with highlighting if I use re-mapping? What do you mean by re-mapping?

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Robert Brown
mapping dots to spaces. I don't think that's workable anyway since .net would cause issues. Tying out the wdftypes now... --- IntelCompute Web Design Local Online Marketing http://www.intelcompute.com On Mon, 6 Feb 2012 04:10:18 -0800 (PST), Ahmet Arslan iori...@yahoo.com wrote: My fear