Re: MultiCollector collect behavior is changed

2018-07-04 Thread Yonghui Zhao
Le mer. 4 juil. 2018 à 05:34, Yonghui Zhao a > écrit : > > > In lucene 4.10, > > If one collector throw CollectionTerminatedException, all collectors are > > terminated. > > > > In lucene 7.2.1, CollectionTerminatedException will only terminate >

MultiCollector collect behavior is changed

2018-07-03 Thread Yonghui Zhao
In lucene 4.10, If one collector throw CollectionTerminatedException, all collectors are terminated. In lucene 7.2.1, CollectionTerminatedException will only terminate current collector, the others won't be terminated. How to keep old behavior?

lucene hadoop index

2018-06-11 Thread Yonghui Zhao
I found there was "org.apache.hadoop.contrib.index.lucene.FileSystemDirectory" for lucene in hadoop old version. http://www.massapi.com/class/org/apache/hadoop/contrib/index/lucene/FileSystemDirectory.html But I don't find this in recent hadoop code base. Is there any plugin support new

Re: EarlyTerminatingSortingCollector is expired in lucene 7.2.1

2018-06-06 Thread Yonghui Zhao
Thanks Adrien! Yes I am aware of this "that EarlyTerminatingSortingCollector does not exactly do that since it works on a per-segment basis" I use EarlyTerminatingSortingCollector for performance when docs hit are too much. 2018-06-04 19:09 GMT+08:00 Adrien Grand : > You are right that

EarlyTerminatingSortingCollector is expired in lucene 7.2.1

2018-06-01 Thread Yonghui Zhao
Hi, I find EarlyTerminatingSortingCollector is expired in lucene 7.2.1. Java doc says Pass trackTotalHits=false to {@link TopFieldCollector} instead of using this class. But I find TopFiledCollector can not fully replace EarlyTerminatingSortingCollector. In EarlyTerminatingSortingCollector

Re: TermsEnum.posting doesn't support acceptDocs

2018-05-30 Thread Yonghui Zhao
lude deleted docs > either (actually they shouldn't do it) as live docs are now checked on top > of scorers. > > Le mer. 30 mai 2018 à 12:57, Yonghui Zhao a écrit > : > > > I find TermsEnum.posting(docsAndPostiions API before) in new lucene > has > > no ac

TermsEnum.posting doesn't support acceptDocs

2018-05-30 Thread Yonghui Zhao
I find TermsEnum.posting(docsAndPostiions API before) in new lucene has no acceptDocs parameter So any replacement? or implement the filter by myself?

disableCoord is removed in lucene boolean query?

2018-05-21 Thread Yonghui Zhao
I am upgrading my project now, I find there is no disableCoord feature in boolean query now? So now the default behavior is disableCoord = true and not configurable?

How to construct a ConstantScoreQuery with FixedBitSet

2018-04-23 Thread Yonghui Zhao
In my project I implement a NullFieldFilter, which will filter the docs index some field regardless the value. The implementation is traverse the index field use TermsEnum and PostingsEnum, or using DocValues advance function to traverse the docs which have this field. In this way I get

Re: what's replacement of FieldCache in Lucene 7

2018-04-13 Thread Yonghui Zhao
Got it, make sense. Thanks Adrien. 2018-04-13 19:16 GMT+08:00 Adrien Grand <jpou...@gmail.com>: > Queries should be fine: they are required to produce sorted iterators since > 5.0 when we removed the accetDocsOutOfOrder option on collectors. > > Le ven. 13 avr. 2018 à 1

Re: what's replacement of FieldCache in Lucene 7

2018-04-13 Thread Yonghui Zhao
in the end, so the bottleneck should be query > processing, not retrieving stored fieds. > > Le ven. 13 avr. 2018 à 05:27, Yonghui Zhao <zhaoyong...@gmail.com> a > écrit : > > > My case is when I get some docs from lucene, I need also get some field > > value of the retriev

Re: what's replacement of FieldCache in Lucene 7

2018-04-12 Thread Yonghui Zhao
the advanceExact API, which > exists on all doc-value iterators. Just make sure to never call it on > decreasing doc IDs. If that doesn't work for you, can you describe you > use-case, maybe there are better ways to implement what you need. > > Le jeu. 12 avr. 2018 à 13:54, Yong

what's replacement of FieldCache in Lucene 7

2018-04-12 Thread Yonghui Zhao
Hi, I am upgrading my project from Lucene 4 to 7. FieldCache is removed in lucene 7, DocValue is replacement? But seems DocValue doesn't support random access. I need random access to get some specified field value quickly. So how to solve it?

Re: SortingMergePolicy is removed in 7.2.1?

2018-04-10 Thread Yonghui Zhao
ePolicy.java > and the associated factory in > ,,,/solr/core/src/java/org/apache/solr/index/ > SortingMergePolicyFactory.java > so I'm not sure what you're having trouble with > > Best, > Erick > > On Tue, Apr 10, 2018 at 4:56 AM, Yonghui Zhao <zhaoyong...@gmail.com

SortingMergePolicy is removed in 7.2.1?

2018-04-10 Thread Yonghui Zhao
I can't find this class now? Which is replacement? Thanks!

any api to get segment number of index

2018-01-10 Thread Yonghui Zhao
Hi, Is there any public API that I can get segment number of current version index? I didn't find in indexwriter or indexsearcher in lucene 4.10.

Re: index sorting merge

2017-12-28 Thread Yonghui Zhao
/blog.mikemccandless.com > > On Thu, Dec 28, 2017 at 11:13 AM, Yonghui Zhao <zhaoyong...@gmail.com> > wrote: > >> Hi, >> >> I specified a SortingMergePolicy in my case. I find only the first N-1 >> segments are sorted as expected, the last segment is stil

index sorting merge

2017-12-28 Thread Yonghui Zhao
Hi, I specified a SortingMergePolicy in my case. I find only the first N-1 segments are sorted as expected, the last segment is still disordered when I call forceMerge(N), N > 1, I think it is by design, but is there any way to make all segments sorted. Thanks !

how to compile lucene 4.10.4 using jdk7

2017-11-24 Thread Yonghui Zhao
Hi, I clone lucene 4.10.4 tag from github and use ant to build. My ant and local jdk in mac info: *Apache Ant(TM) version 1.9.9 compiled on February 2 2017* *Trying the default build file: build.xml* *Buildfile: /Users/yozhao/src/lucene-solr/lucene/core/build.xml* *Detected Java version: 1.7

strange performance diff

2017-11-10 Thread Yonghui Zhao
Hi, I use the code below to test same query on same index in one time. long t0 = System.currentTimeMillis(); indexSearcher.search(query, from + size); long t1 = System.currentTimeMillis(); LOGGER.info("indexSearcher.search(query, from + size) took:" + (t1 - t0) + "ms"); TopScoreDocCollector

Re: TieredMergePolicy disrupts doc id order after merge

2017-09-28 Thread Yonghui Zhao
s not guaranteed to work, > this is only an implementation detail. The internal IDs are also not > stable!!! > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > >

TieredMergePolicy disrupts doc id order after merge

2017-09-28 Thread Yonghui Zhao
Hi, It is easier to elaborate my question with an example. My lucene version is 4.10.4 I use SortField sortField = new SortField(null, SortField.Type.DOC, true); sort = new Sort(sortField); return new SortingMergePolicy(new TieredMergePolicy(), sort); to make sure my index merger will make

Re: A flush exception in lucene 4.10.0

2017-03-13 Thread Yonghui Zhao
quot;D E F",3rd is "G H I". If I concatenate these values with space, "A B C D E F G H I" , so each doc only index this field one 1 time, but has the same effect, no exception occurs. 2017-03-10 13:43 GMT+08:00 Yonghui Zhao <zhaoyong...@gmail.com>: > My version

Re: A flush exception in lucene 4.10.0

2017-03-08 Thread Yonghui Zhao
Seems it is related to an empty segment, all docs in this segment are deleted before commit. Any one can confirm it? Maybe I need upgrade my lucene version. 2017-03-03 10:19 GMT+08:00 Yonghui Zhao <zhaoyong...@gmail.com>: > Hi all, > > Anyone see this exception before? Is

Re: any analyzer will keep punctuation?

2017-03-07 Thread Yonghui Zhao
t; factory would be solution. > Please see types attribute of the word delimiter filter for customising > characters. > > ahmet > > > > On Monday, March 6, 2017 12:22 PM, Yonghui Zhao <zhaoyong...@gmail.com> > wrote: > Yes whitespace analyzer will keep punctuatio

Re: any analyzer will keep punctuation?

2017-03-06 Thread Yonghui Zhao
> Hi, > > Whitespace analyser/tokenizer for example. > > Ahmet > > > > On Monday, March 6, 2017 10:21 AM, Yonghui Zhao <zhaoyong...@gmail.com> > wrote: > Lucene standard anlyzer will remove almost all punctuation. > In some cases, we want to keep some punctuation,

any analyzer will keep punctuation?

2017-03-06 Thread Yonghui Zhao
Lucene standard anlyzer will remove almost all punctuation. In some cases, we want to keep some punctuation, for example in music search, some singer name and album name could be a punctuation. Is there any analyzer that we can customized punctuation to be removed?

A flush exception in lucene 4.10.0

2017-03-02 Thread Yonghui Zhao
Hi all, Anyone see this exception before? Is it a lucene bug or something wrong in my code? Exception in thread "Thread-14" java.lang.IllegalArgumentException: maxValue must be non-negative (got: -1) at org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1141)

query parser of SpanNearQuery

2016-12-04 Thread Yonghui Zhao
It seems lucene query parser doesn't support SpanNearQuery. Is there any query parser supports SpanNearQuery?

how to specify disableCoord in query string

2016-04-26 Thread Yonghui Zhao
Does lucene query parser support disableCoord in query string? Thanks

Re: Lucene warm up

2016-04-15 Thread Yonghui Zhao
arms > newly merged segments before making them visible to the next > near-real-time reader. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Apr 15, 2016 at 3:04 AM, Yonghui Zhao <zhaoyong...@gmail.com> > wrote: > > As we know when a new I

Re: Any lucene query sorts docs by Hamming distance?

2015-12-24 Thread Yonghui Zhao
> > > On Dec 22, 2015, at 4:02 AM, Yonghui Zhao <zhaoyong...@gmail.com> wrote: > > > > Hi, > > > > Is there any query can sort docs by hamming distance if field values are &

Any lucene query sorts docs by Hamming distance?

2015-12-22 Thread Yonghui Zhao
Hi, Is there any query can sort docs by hamming distance if field values are same length, Seems fuzzy query only works on edit distance.

does field cache support multivalue?

2015-11-19 Thread Yonghui Zhao
If I index one filed more than 1 times, it seems I can't get all values from lucene field cache? right?

confused facet example

2014-09-30 Thread Yonghui Zhao
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java In SimpleFacetsExample, /** Runs the search example. */ public ListFacetResult runFacetOnly() throws IOException { index(); return

sortedset vs taxonomy

2014-09-22 Thread Yonghui Zhao
If we want to implement simple facet counting feature, it seems we can do it via sortedset or taxonomy writer/reader. Seems sortedset is simpler but doesn't support hierarchical facet count such as A/B/C. I want to know what's advantage/disadvantage of sortedset or taxonomy? Is there any

Can phrasequery allow mismatch?

2014-07-17 Thread Yonghui Zhao
Hi, I want to implement a query like phrase query with slop 0, but I can allow one term mismatch. For example, the text is A B C D E I want to match this text with the query A B C X E. X mismatches the D. i.e. Query A B C D E will match “W1 W2 W3 W4 W5”, the 5 words are consecutive and

Sorted NumericDocValues

2014-03-05 Thread Yonghui Zhao
Hi, Is there any data type in lucene can support functions like SortedDocValues for any numeric(int, long, float, double) type. SortedDocValues only supports bytes, I want some data type can get numeric value and ord(-1 for doc doesn't have the field) for each doc. NumericDocValues only

Re: Sorted NumericDocValues

2014-03-05 Thread Yonghui Zhao
AtomicReader.getDocsWithField to know whether the doc had that field? Mike McCandless http://blog.mikemccandless.com On Wed, Mar 5, 2014 at 7:00 AM, Yonghui Zhao zhaoyong...@gmail.com wrote: Hi, Is there any data type in lucene can support functions like SortedDocValues for any numeric(int, long

simple question about index reader

2014-02-13 Thread Yonghui Zhao
Hi, I am new to lucene and I get a simple question about index reader. If I open a DirectoryReader say reader1 based on a disk directory, then the lucene index directory is changed, to get new result I need get a new DirectoryReader. Suppose reader1 will get the result before the change

IndexFileNameFilter

2013-09-18 Thread Yonghui Zhao
In lucene 4.3.0 there is no IndexFileNameFilter. And I find in org.apache.lucene.index.IndexFileNames the index file extensions have only 3 types. public static final String INDEX_EXTENSIONS[] = new String[] { COMPOUND_FILE_EXTENSION, COMPOUND_FILE_ENTRIES_EXTENSION, GEN_EXTENSION,

Re: IndexFileNameFilter

2013-09-18 Thread Yonghui Zhao
extension. On Wed, Sep 18, 2013 at 1:03 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: In lucene 4.3.0 there is no IndexFileNameFilter. And I find in org.apache.lucene.index.IndexFileNames the index file extensions have only 3 types. public static final String INDEX_EXTENSIONS[] = new

NumericField traverse order

2013-08-21 Thread Yonghui Zhao
If we traverse a string field use code below, the value order is string older. Terms terms = reader.terms(“strField); if (terms != null) { TermsEnum termsEnum = terms.iterator(null); BytesRef text; while ((text = termsEnum.next()) != null) How about numeric field.

RE: NumericField traverse order

2013-08-21 Thread Yonghui Zhao
-Original Message- From: Yonghui Zhao [mailto:zhaoyong...@gmail.com] Sent: Wednesday, August 21, 2013 1:38 PM To: java-user@lucene.apache.org Subject: NumericField traverse order If we traverse a string field use code below, the value order is string older. Terms terms

IllegalStateException in SpanTermQuery

2013-08-13 Thread Yonghui Zhao
One of my UT is passed In lucene 3.5, but it is failed in lucene4.3. The exception is: IllegalStateException(field \ + term.field() + \ was indexed without position data; cannot run SpanTermQuery (term= + term.text() + )); After I change index option of the field from DOCS_ONLY to

Re: IllegalStateException in SpanTermQuery

2013-08-13 Thread Yonghui Zhao
On Tue, Aug 13, 2013 at 7:41 AM, Yonghui Zhao zhaoyong...@gmail.com wrote: One of my UT is passed In lucene 3.5, but it is failed in lucene4.3. The exception is: IllegalStateException(field \ + term.field() + \ was indexed without position data; cannot run SpanTermQuery (term

SortField is not serializable

2013-08-07 Thread Yonghui Zhao
In lucene 4.3, SortField is not serializable now. When I try to serialize a request which has SortField, java.io.NotSerializableException: org.apache.lucene.search.SortField exception is thrown out. Any work around?

getNumericDocValues

2013-07-29 Thread Yonghui Zhao
In luncene 4.3 AtomicReader has this interface public abstract NumericDocValues getNumericDocValues(String field) throwsIOException If I get a NumericDocValues of one field from a reader. NumericDocValues has get interface. /** * Returns the numeric value for the specified document ID.

Re: getNumericDocValues

2013-07-29 Thread Yonghui Zhao
Got it, thank you very much. 在 2013-7-29 下午11:34,Adrien Grand jpou...@gmail.com写道: Hi, On Mon, Jul 29, 2013 at 4:56 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: I want to know what will be returned if the input docID is not a valid id, for examples: 1. the docID beyonds the reader

Re: 2 exceptions in IndexWriter

2013-07-25 Thread Yonghui Zhao
best to open IndexWriter with OpenMode.CREATE to purge (rather than remove the files yourself). Lock obtain timed out means another IndexWriter is currently using that directory. Mike McCandless http://blog.mikemccandless.com On Thu, Jul 25, 2013 at 12:26 AM, Yonghui Zhao zhaoyong

2 exceptions in IndexWriter

2013-07-24 Thread Yonghui Zhao
Recently I find my unit test will failed sometimes but no always. I use Lucene 4.3.0 After inverstigation, I found when I try to open a IndexWriter for a disk directory. Some time it will throw this exception: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:

Re: TermDocs

2013-07-09 Thread Yonghui Zhao
question? -- Ian. On Mon, Jul 8, 2013 at 12:32 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: Hi, What's proper replacement of TermDocs termDocs = reader.termDocs(null);“ in lucene 4.x It seems reader.termDocsEnum(term) can't take null as a input parameter

getLocale of SortField

2013-07-09 Thread Yonghui Zhao
I am updating one project from lucene 3.x to lucene 4.x I found getLocale of SortField is moved. How can I fix it?

RE: getLocale of SortField

2013-07-09 Thread Yonghui Zhao
://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Yonghui Zhao [mailto:zhaoyong...@gmail.com] Sent: Tuesday, July 09, 2013 1:45 PM To: java-user@lucene.apache.org Subject: getLocale of SortField I am updating one project from lucene 3.x to lucene 4.x I found getLocale

TermDocs

2013-07-08 Thread Yonghui Zhao
Hi, What's proper replacement of TermDocs termDocs = reader.termDocs(null);“ in lucene 4.x It seems reader.termDocsEnum(term) can't take null as a input parameter.

Re: simple question about decRef

2013-06-01 Thread Yonghui Zhao
then files will be held open and you'll eventually exhaust the limit of open file descriptors. Mike McCandless http://blog.mikemccandless.com On Fri, May 31, 2013 at 8:12 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: After we use IndexReader do we always need call decRef explicitly? What

simple question about decRef

2013-05-31 Thread Yonghui Zhao
After we use IndexReader do we always need call decRef explicitly? What will happen, if I don't call decRef? Thanks Sent from my iPad - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands,