@lucene.apache.org
Subject: Re: Which Tokeniser (and/or filter)
Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST)
: This all seems a bit too much work for such a real-world scenario?
You haven't really told us what your scenerio is.
You said you want to split tokens on whitespace, full-stop (aka:
period
-
From: Chris Hostetter hossman_luc...@fucit.org
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Which Tokeniser (and/or filter)
Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST)
: This all seems a bit too much work for such a real-world scenario?
You haven't really told
://www.intelcompute.com
-Original Message-
From: Chris Hostetter hossman_luc...@fucit.org
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Which Tokeniser (and/or filter)
Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST)
: This all seems a bit too much work for such a real
., asp.net, .net, net.
Cheers,
Rob
--
IntelCompute
Web Design and Online Marketing
http://www.intelcompute.com
-Original Message-
From: Chris Hostetter hossman_luc...@fucit.org
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Which Tokeniser
@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Which Tokeniser (and/or filter)
Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST)
: This all seems a bit too much work for such a real-world scenario?
You haven't really told us what your scenerio is.
You said you want to split tokens
...@fucit.org
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Which Tokeniser (and/or filter)
Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST)
: This all seems a bit too much work for such a real-world scenario?
You haven't really told us what your scenerio
I'm still finding matches across newlines
index...
i am fluent
german racing
search...
fluent german
Any suggestions? I've currently got this in wdftypes.txt for
WordDelimiterfilterfactory
\u000A = ALPHANUM
\u000B = ALPHANUM
\u000C = ALPHANUM
\u000D = ALPHANUM
# \u000D\u000A = ALPHA
I'm still finding matches across
newlines
index...
i am fluent
german racing
search...
fluent german
Any suggestions?
You can use a multiValued field for this. Split your document according to new
line at client side.
arri am fluent/arr
arrgerman racing/arr
This all seems a bit too much work for such a real-world scenario?
---
IntelCompute
Web Design Local Online Marketing
http://www.intelcompute.com
On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
iori...@yahoo.com wrote:
I'm still finding matches across
newlines
index...
i am
Well, this is a common approach. Someone has to split up the
input as sentences (whatever they are). Putting them in multi-valued
fields is trivial.
Then you confine things to within sentences, then you start searching
phrases with a slop less than your incrementGap...
Best
Erick
On Tue, Feb 7,
A custom tokenizer/tokenfilter could set the position increment when a newline
comes through as well.
Erik
On Feb 7, 2012, at 15:28, Erick Erickson erickerick...@gmail.com wrote:
Well, this is a common approach. Someone has to split up the
input as sentences (whatever they are). Putting
: This all seems a bit too much work for such a real-world scenario?
You haven't really told us what your scenerio is.
You said you want to split tokens on whitespace, full-stop (aka:
period) and comma only, but then in response to some suggestions you added
comments other things that you
Hi,
I need to tokenise on whitespace, full-stop, and comma ONLY.
Currently using solr.WhitespaceTokenizerFactory with
WordDelimiterFilterFactory but this is also splitting on , /,
new-line, etc.
It seems such a simple setup, what am I doing wrong? what do you use
for such normal
I need to tokenise on whitespace, full-stop, and comma
ONLY.
Currently using solr.WhitespaceTokenizerFactory with
WordDelimiterFilterFactory but this is also splitting on
, /, new-line, etc.
WDF is customizable via types=wdftypes.txt parameter.
My fear is what will then happen with highlighting if I use re-mapping?
On Mon, 6 Feb 2012 03:33:03 -0800 (PST), Ahmet Arslan
iori...@yahoo.com wrote:
I need to tokenise on whitespace, full-stop, and comma
ONLY.
Currently using solr.WhitespaceTokenizerFactory with
My fear is what will then happen with
highlighting if I use re-mapping?
What do you mean by re-mapping?
mapping dots to spaces. I don't think that's workable anyway since
.net would cause issues.
Tying out the wdftypes now...
---
IntelCompute
Web Design Local Online Marketing
http://www.intelcompute.com
On Mon, 6 Feb 2012 04:10:18 -0800 (PST), Ahmet Arslan
iori...@yahoo.com wrote:
My fear
17 matches
Mail list logo