Re: Document serializable representation

2017-03-30 Thread Denis Bazhenov
Hi. Thanks for the reply. Of course each document go into exactly one shard. > On Mar 31, 2017, at 15:01, Erick Erickson wrote: > > I don't believe addIndexes does much except rewrite the > segments file (i.e. the file that tells Lucene what > the current segments are). > > That said, if you'r

Re: Document serializable representation

2017-03-30 Thread Erick Erickson
I don't believe addIndexes does much except rewrite the segments file (i.e. the file that tells Lucene what the current segments are). That said, if you're desperate you can optimize/force-merge. Do note, though, that no deduplication is done. So if the indexes you're merging have docs with the s

Re: Document serializable representation

2017-03-30 Thread Denis Bazhenov
Yeah, I definitely will look into PreAnalyzedField as you and Michail suggest. Thank you. > On Mar 30, 2017, at 19:15, Uwe Schindler wrote: > > But that's hard to implement. I'd go for Solr instead of doing that on your > own! --- Denis Bazhenov

Re: Document serializable representation

2017-03-30 Thread Denis Bazhenov
Interesting. In case of addIndexes() does Lucene perform any optimization on segments before searching over individual segments or those indexes are searched "as is”? > On Mar 30, 2017, at 19:09, Mikhail Khludnev wrote: > > I believe you can have more shards for indexing and then merge (and no

RE: Document serializable representation

2017-03-30 Thread Uwe Schindler
Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Denis Bazhenov [mailto:dot...@gmail.com] > Sent: Thursday, March 30, 2017 11:02 AM > To: java-user@lucene.apache.org > Subject: Re: Document serializable

Re: Document serializable representation

2017-03-30 Thread Mikhail Khludnev
I believe you can have more shards for indexing and then merge (and not literally, but just by addIndexes() or so ) them to smaller number for search. Transferring indices is more efficient (scp -C) than separate tokens and their attributes over the wire. On Thu, Mar 30, 2017 at 12:02 PM, Denis Ba

Re: Document serializable representation

2017-03-30 Thread Denis Bazhenov
We already have done this. Many years ago :) At the moment we have 7 shards. The problem with getting more shards is that search become less cost effective (in terms of cluster CPU time per request) as you split index in more shards. Considering response time is good enough and the fact search

RE: Document serializable representation

2017-03-30 Thread Uwe Schindler
Hi, the document does not contain the analyzed tokens. The Lucene Analyzers are called inside the IndexWriter *during* indexing, so there is no way to do that somewhere else. The IndexableDocument instances by Lucene are just iterables of IndexableField that contain the unparsed fulltext as pas