Re: surrogate pairs

2010-03-12 Thread David Leangen
Hi, Yuta-san, >> Now I use own Analyzer which based on "MeCab" (It's open source >> Japanese morphological analyzer). >> I try to modify it to support surrogate pairs. >> >> And I'm expecting the next release! Cool! I look forward to that. Is there a link somewhere to your project? I am very

RE: Combining TopFieldCollector with custom Collector

2010-03-12 Thread Uwe Schindler
http://en.wikipedia.org/wiki/Delegation_pattern - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Peter Keegan [mailto:peterlkee...@gmail.com] > Sent: Thursday, March 11, 2010 9:41 PM > To: java-user@lucen

Dealing with special cases in analyser

2010-03-12 Thread Paul Taylor
Hi, I'm using a custom analyser based on standardanalyser with good results to search artists (i.e rolling stones/beatles) but it fails to match some weird artists names such as '!!!', this is not suprising because the analyser ignores punctuation which is what I want it to normally. I just won

Sorting case insensitive wildcard query (with highlight)

2010-03-12 Thread Kev Kilroy
Hi, I'm using Lucene 2.4.1 with Hibernate Search 3.1.1. I have objects in the index, for each field I index as follows: @Fields( value = { @Field(index = Index.TOKENIZED, store = Store.YES), @Field(name = "name_forSort", index = Index.UN_TOKENIZED, store = Store.NO), }) T

Re: Sorting case insensitive wildcard query (with highlight)

2010-03-12 Thread Ian Lea
Can you just lowercase a dedicated sort field and leave the others alone. -- Ian. On Fri, Mar 12, 2010 at 10:47 AM, Kev Kilroy wrote: > > Hi, > > I'm using Lucene 2.4.1 with Hibernate Search 3.1.1. I have objects in the > index, for each field I index as follows: > > @Fields( value = { >    

Re: File descriptor leak in ParallelReader.reopen()

2010-03-12 Thread Michael McCandless
Really, your app should not drop things on the floor and hope for the best you should explicitly close your IRs when you're done with them. I think the relevant change here was the removal of finalizers, under this issue: http://issues.apache.org/jira/browse/LUCENE-1715 [Simple]FSDir's I

Question on number of fields in a document

2010-03-12 Thread Vinicius Carvalho
Hello there! We are indexing metadata for our medias. One ideia is that each user adds its own metadata, so each document may have different number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with the this relax approach. Also, considering that each user may define its own meta

RE: File descriptor leak in ParallelReader.reopen()

2010-03-12 Thread Alexey Lef
You are right. My test was faulty. I do get descriptor leak even with SimpleFSDir. I guess I have some work to do. Thanks! Alexey -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, March 12, 2010 5:13 PM To: java-user@lucene.apache.org Subject:

Re: Question on number of fields in a document

2010-03-12 Thread Erick Erickson
There's no requirement that all documents have the same fields, Lucene is fine with different docs having different fields. There's no limit on the number of different fields allowed that I know of, but I'm sure someone will chime in if there is HTH Erick On Fri, Mar 12, 2010 at 7:51 AM, Vin

RE: Question on number of fields in a document

2010-03-12 Thread Uwe Schindler
You get memory problems if you turn on norms for all those fields (as norms are large byte[] arrays per field). But this is not a hard limitation, but you should take care. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Me

Re: Question on number of fields in a document

2010-03-12 Thread Renaud Delbru
There is some bottleneck when you have a large number of fields and of words. Each field has its own list of terms which means that the dictionary, in the worst case, could be of size n*m (with n the number of fields, and m the number of terms). This can lead to some overhead when looking up a t

Re: Combining TopFieldCollector with custom Collector

2010-03-12 Thread Peter Keegan
Ok, thanks. I got stuck on trying to extend TopFieldDocCollector and didn't notice it's also a TopDocsCollector. A couple of questions about Solr: 1. In Solr's DocSetDelegateCollector, a lot of code is duplicated. Why not this: public void collect(int doc) throws IOException { collector.collect

TREC-3 Runs

2010-03-12 Thread Ivan Provalov
Just to follow up on our previous discussion, here are a few runs in which we have tested some of the Lucene different scoring mechanisms and other options. We used Lucene's patches for LnbLtcSimilarity and BM25 and contrib module for the SweetSpotSimilarity. Lucene Default: 0.149 Lucene BM25:

Re: DisjunctionMaxQuery with tie breaker=1 same as MultiFieldQueryParser?

2010-03-12 Thread Marc Sturlese
Thanks Hoss for the useful info. Acording the coord(q,d) definition it's calculated at document level. It's said: is a score factor based on how many of the query terms are found in the specified document If I am just searching for a term, "ipod" in this case, how would be coord computed? Would i

RE: Old Lucene src archive corrupt?

2010-03-12 Thread An Hong
I just want to report that the download of the zip sources from the old-archive directory now works for me. I'm not sure what the problem was, but it's gone now. Thanks to those who replied. An Hong -Original Message- From: Scott Ribe [mailto:scott_r...@killerbytes.com] Sent: Wednesd