TermAnalyzer# tokenStream ( final String fieldName, final Reader reader )
--
TokenStream t = new WhitespaceAnalyzer( Version.LUCENE_31 ).tokenStream(
fieldName, cf);
t = new StopFilter( Version.LUCENE_31, t,
Out of curiosity, what is the problem you are trying to solve?
I am trying to provide suggestions for search terms/word, such as google does.
When the user starts typing the search term, I look up my TermIndex to provide
possible search terms which fit the characters provided...
Thx
Clemens
Hi,
I've been researching about clustering with Lucene. Here is what
I've found so far,
1) Lucene clustering with Carrot2 -
http://download.carrot2.org/head/manual/#section.getting-started.lucene
- but, this seems suitable for only smaller size index (few hundred
documents) -
Can you shed some more light on what you're trying to achieve (what is
the purpose of clustering -- are clusters to be utilized for front-end
user interface, further data mining analysis, etc.)?
With the sizes you report Carrot2 won't work for you, I'm afraid, but
Mahout may. Still, there's
This is the code in IndexReader.close():
public final synchronized void close() throws IOException {
if (!closed) {
decRef();
closed = true;
}
}
What strikes me as odd is that “closed” variable is set to true regardless of
whether the index was actually closed using
Hi,
I have created my own custom analyzer and uses jFlex to made search for c#,
.net, c++ etc.
While I am trying to search c#, .net, c++ QueryParser parse .net to .net and
C++ to C++. So it works fine. But in case of C# QueryParser parse it to C which
makes trouble for me.
Also tried to use
Hi Ranjit,
I suspect the problem is not QueryParser, since the TERM definition includes
the '#' character (from
http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/src/java/org/apache/lucene/queryParser/QueryParser.jj?view=markup#l1136):
| #_TERM_START_CHAR: ( ~[ , \t, \n, \r,
help to give some detail info
2011-04-26
haichengyl
发件人: Ranjit Kumar
发送时间: 2011-04-26 21:55:04
收件人: java-user-h...@lucene.apache.org; java-user@lucene.apache.org
抄送:
主题: lucene 3.0.3 | QueryParser | MultiFieldQueryParser
Hi,
I have created my own custom analyzer and uses jFlex
hope to sent some detail about it.
2011-04-26
haichengyl
发件人: Ranjit Kumar
发送时间: 2011-04-26 21:55:04
收件人: java-user-h...@lucene.apache.org; java-user@lucene.apache.org
抄送:
主题: lucene 3.0.3 | QueryParser | MultiFieldQueryParser
Hi,
I have created my own custom analyzer and uses
The code is tricky, but it's intentional.
We always set closed to true to guard against double close, ie, it's
fine to double-close an IndexReader, ie doing so will not steal
references from other places that have incRef'd the reader.
Can you pass closeSubReaders=false when you create your
Thanks Dawid for the reply. Here is what we are trying to do,
1) We index around 20 fields, of that we want to have grouping option
for five of them. For ex., user can search on name of the city and we
should have option to group by products available in that city (and
vice-versa).
2) We also
Hi,
Currently when I type in Arcos Bioscience in my lucene search, it returns all
those documents with
either Arcos or Bioscience at the top of the search results and the actual
document containing
Arcos Bioscience somewhere in the middle/bottom.
The desired behavior is to rank those
Hi Deepak,
Would something like this work in your case?
Arcos Bioscience^2.0 Arcos Bioscience
ie, a BooleanQuery with the full phrase boosted OR'd with a query on
each word?
-sujit
On Tue, 2011-04-26 at 14:46 -0400, Deepak Konidena wrote:
Hi,
Currently when I type in Arcos Bioscience in
1) We index around 20 fields, of that we want to have grouping option
for five of them. For ex., user can search on name of the city and we
should have option to group by products available in that city (and
vice-versa).
Are these fields stricly defined or free text? Because if they are
You can also specify a large slop in your phrase (e.g.
arcos biosciences~500 which will take distance into
account when scoring, although it may not be enough
to rank the document where you want. Sujit's comment
is probably a better place to start.
Best
Erick
On Tue, Apr 26, 2011 at 2:59 PM,
Hello everybody,
As far as I know Lucene processes documents DAAT. Depending on the query
either the intersection or union is calculated. For the intersection only
documents occurring in all posting lists are scored. In the union case every
document is scored which makes it a more expensive
Thanks Dawid. I was trying to give some example, but this is not
exactly our text. Our fields include things like user name, IP
Address, Application Name, Port 3, Byte Count - all network
related stuff. So, if user searches on certain IP address then we
would need to group the result by user,
17 matches
Mail list logo