AW: Umlaute getting lost

2011-04-26 Thread Clemens Wyss
TermAnalyzer# tokenStream ( final String fieldName, final Reader reader ) -- TokenStream t = new WhitespaceAnalyzer( Version.LUCENE_31 ).tokenStream( fieldName, cf); t = new StopFilter( Version.LUCENE_31, t,

AW: Umlaute getting lost

2011-04-26 Thread Clemens Wyss
Out of curiosity, what is the problem you are trying to solve? I am trying to provide suggestions for search terms/word, such as google does. When the user starts typing the search term, I look up my TermIndex to provide possible search terms which fit the characters provided... Thx Clemens

Clustering with Lucene?

2011-04-26 Thread vivek sar
Hi, I've been researching about clustering with Lucene. Here is what I've found so far, 1) Lucene clustering with Carrot2 - http://download.carrot2.org/head/manual/#section.getting-started.lucene - but, this seems suitable for only smaller size index (few hundred documents) -

Re: Clustering with Lucene?

2011-04-26 Thread Dawid Weiss
Can you shed some more light on what you're trying to achieve (what is the purpose of clustering -- are clusters to be utilized for front-end user interface, further data mining analysis, etc.)? With the sizes you report Carrot2 won't work for you, I'm afraid, but Mahout may. Still, there's

IndexReader.close() behavior

2011-04-26 Thread Alexey Lef
This is the code in IndexReader.close(): public final synchronized void close() throws IOException { if (!closed) { decRef(); closed = true; } } What strikes me as odd is that “closed” variable is set to true regardless of whether the index was actually closed using

lucene 3.0.3 | QueryParser | MultiFieldQueryParser

2011-04-26 Thread Ranjit Kumar
Hi, I have created my own custom analyzer and uses jFlex to made search for c#, .net, c++ etc. While I am trying to search c#, .net, c++ QueryParser parse .net to .net and C++ to C++. So it works fine. But in case of C# QueryParser parse it to C which makes trouble for me. Also tried to use

RE: lucene 3.0.3 | QueryParser | MultiFieldQueryParser

2011-04-26 Thread Steven A Rowe
Hi Ranjit, I suspect the problem is not QueryParser, since the TERM definition includes the '#' character (from http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/src/java/org/apache/lucene/queryParser/QueryParser.jj?view=markup#l1136): | #_TERM_START_CHAR: ( ~[ , \t, \n, \r,

Re: lucene 3.0.3 | QueryParser | MultiFieldQueryParser

2011-04-26 Thread haichengyl
help to give some detail info 2011-04-26 haichengyl 发件人: Ranjit Kumar 发送时间: 2011-04-26 21:55:04 收件人: java-user-h...@lucene.apache.org; java-user@lucene.apache.org 抄送: 主题: lucene 3.0.3 | QueryParser | MultiFieldQueryParser Hi, I have created my own custom analyzer and uses jFlex

Re: lucene 3.0.3 | QueryParser | MultiFieldQueryParser

2011-04-26 Thread haichengyl
hope to sent some detail about it. 2011-04-26 haichengyl 发件人: Ranjit Kumar 发送时间: 2011-04-26 21:55:04 收件人: java-user-h...@lucene.apache.org; java-user@lucene.apache.org 抄送: 主题: lucene 3.0.3 | QueryParser | MultiFieldQueryParser Hi, I have created my own custom analyzer and uses

Re: IndexReader.close() behavior

2011-04-26 Thread Michael McCandless
The code is tricky, but it's intentional. We always set closed to true to guard against double close, ie, it's fine to double-close an IndexReader, ie doing so will not steal references from other places that have incRef'd the reader. Can you pass closeSubReaders=false when you create your

Re: Clustering with Lucene?

2011-04-26 Thread vivek sar
Thanks Dawid for the reply. Here is what we are trying to do, 1) We index around 20 fields, of that we want to have grouping option for five of them. For ex., user can search on name of the city and we should have option to group by products available in that city (and vice-versa). 2) We also

Reg: Query behavior

2011-04-26 Thread Deepak Konidena
Hi, Currently when I type in Arcos Bioscience in my lucene search, it returns all those documents with either Arcos or Bioscience at the top of the search results and the actual document containing Arcos Bioscience somewhere in the middle/bottom. The desired behavior is to rank those

Re: Reg: Query behavior

2011-04-26 Thread Sujit Pal
Hi Deepak, Would something like this work in your case? Arcos Bioscience^2.0 Arcos Bioscience ie, a BooleanQuery with the full phrase boosted OR'd with a query on each word? -sujit On Tue, 2011-04-26 at 14:46 -0400, Deepak Konidena wrote: Hi, Currently when I type in Arcos Bioscience in

Re: Clustering with Lucene?

2011-04-26 Thread Dawid Weiss
1) We index around 20 fields, of that we want to have grouping option for five of them. For ex., user can search on name of the city and we should have option to group by products available in that city (and vice-versa). Are these fields stricly defined or free text? Because if they are

Re: Reg: Query behavior

2011-04-26 Thread Erick Erickson
You can also specify a large slop in your phrase (e.g. arcos biosciences~500 which will take distance into account when scoring, although it may not be enough to rank the document where you want. Sujit's comment is probably a better place to start. Best Erick On Tue, Apr 26, 2011 at 2:59 PM,

Lucene query processing

2011-04-26 Thread Alex vB
Hello everybody, As far as I know Lucene processes documents DAAT. Depending on the query either the intersection or union is calculated. For the intersection only documents occurring in all posting lists are scored. In the union case every document is scored which makes it a more expensive

Re: Clustering with Lucene?

2011-04-26 Thread vivek sar
Thanks Dawid. I was trying to give some example, but this is not exactly our text. Our fields include things like user name, IP Address, Application Name, Port 3, Byte Count - all network related stuff. So, if user searches on certain IP address then we would need to group the result by user,