Re: What is the proper use of stop words in Lucene?

2014-04-28 Thread Chris Tomlinson
Hello Uwe, Thank you for the reply. I see that there is a version check for the use of setEnablePositionIncrements(false); and, I think I may be able to use an earlier api with the eXist-db embedding of Lucene 4.4 to avoid the version check. However, my question was intended to improve my unde

[ANNOUNCE] Apache Lucene 4.8.0 released

2014-04-28 Thread Uwe Schindler
28 April 2014, Apache Luceneā„¢ 4.8.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.0 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-

Re: What is the proper use of stop words in Lucene?

2014-04-28 Thread Chris Tomlinson
Hi, On Apr 28, 2014, at 11:45 AM, Uwe Schindler wrote: >> Hello Uwe, >> >> Thank you for the reply. I see that there is a version check for the use of >> setEnablePositionIncrements(false); and, I think I may be able to use an >> earlier api with the eXist-db embedding of Lucene 4.4 to avoid th

RE: What is the proper use of stop words in Lucene?

2014-04-28 Thread Uwe Schindler
> Hello Uwe, > > Thank you for the reply. I see that there is a version check for the > use of setEnablePositionIncrements(false); and, I think I may be able > to use an earlier api with the eXist-db embedding of Lucene 4.4 to > avoid the version check. Hi, you don't need an older version of

RE: What is the proper use of stop words in Lucene?

2014-04-28 Thread Uwe Schindler
Hi, > > What you intend to do is not a "stopword" use case. You want to "ignore" > some words - Lucene has no support for this, because in native language > processing this makes no sense. > > Thank you for the information. I was unaware that ignoring some words > "makes no sense". I thought I ga

RE: What is the proper use of stop words in Lucene?

2014-04-28 Thread Uwe Schindler
Here is the ElisionFilter of Lucene: https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html This one only works with apostrophe elisions (' and U+2019), so maybe does not apply for Tibetan. But it should inspire you. Uwe - Uwe Schindler H.-

Re: What is the proper use of stop words in Lucene?

2014-04-28 Thread Chris Tomlinson
On Apr 28, 2014, at 3:36 PM, Uwe Schindler wrote: > Hi, > >>> What you intend to do is not a "stopword" use case. You want to "ignore" >> some words - Lucene has no support for this, because in native language >> processing this makes no sense. >> >> Thank you for the information. I was unawar

DocValues for multiple indexes

2014-04-28 Thread Alice Wong
Hello, I have two separate indexes with similar fields. To search them at once I created a IndexSearcher on top of a MultiReader initialized by two readers of these two indexes. And it works well. Since both indexes has a NumericDocValuesField called "id", I used this field both for sorting and r

Fields, Index segments and docIds

2014-04-28 Thread Olivier Binda
Hello, My lucene Index is built and stored in a zip file (uncompressed) which is used as a read-only Directory. 1) At lucene indexing time, is it possible to rewrite the index so that some fields are only found in some segments Say : EnglishWords, EnglishVerbs go to Segment 1 GermanWords, G

RE: Fields, Index segments and docIds

2014-04-28 Thread Uwe Schindler
Hi Oliver, To me it looks like you want to do it much too complicated. It also seems that you misunderstood join queries, which seems to be your problem. Comments inside: > My lucene Index is built and stored in a zip file (uncompressed) which is used > as a read-only Directory. > > 1) At lucen