date:20160706

Hierarchical Facets need duplicated counts

2016-07-06 Thread Nicola Buso

Hello everyone, we are using hierarchical facets (from org.apache.lucene.facet.taxonomy), in our case 1 entry can have several values referencing more leaves in the hierarchical facet. At search time we are noticing that if we search for exactly 1 entry we have count = 1 in the hierarchical facet

dv field is too large

2016-07-06 Thread Sheng

Hi, I am getting an IAE indicating one of the SortedDocValueField is too large, > 32k I googled a bit, and it seems like #Lucene-4583 has addressed this issue in 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or misunderstand anything ? Thanks,

Re: dv field is too large

2016-07-06 Thread Michael McCandless

I believe only binary DVs can be larger than 32K bytes. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 10:31 AM, Sheng wrote: > Hi, > > I am getting an IAE indicating one of the SortedDocValueField is too large, > > 32k > > I googled a bit, and it seems like #Lucene-4583

Lucene indexes getting deleted after application restart

2016-07-06 Thread Desteny Child

Hi! In my Spring/Lucene application I'm using Lucene IndexWriter, TrackingIndexWriter, SearcherManager and ControlledRealTimeReopenThread. I use open mode - IndexWriterConfig.OpenMode.CREATE_OR_APPEND. Right now I'm trying to index a thousands of a documents. For this purpose I have added Apache

indexing 15 million documents to lucene

2016-07-06 Thread Nomar Morado

Hi I am trying to write 15 million documents (and maybe more) to lucene for indexing. I would try to call writer.commit at some #/byte size of documents. The entire generated lucene files are about 1 GB total. My timing is about ~ 15-20 mins. I don't know if there are other configurations i ca

Re: dv field is too large

2016-07-06 Thread Sheng

Mike - Thanks for the prompt response. Is there a way to bypass this constraint for SortedDocValueField ? Or we have to live with it, meaning no fix even in future release? On Wednesday, July 6, 2016, Michael McCandless wrote: > I believe only binary DVs can be larger than 32K bytes. > > Mike Mc

Re: dv field is too large

2016-07-06 Thread Erick Erickson

Is this an "XY" problem? Meaning, why do you need DV fields larger than 32K? You can't search it as text as it's not tokenized. Faceting and sorting by a 32K field doesn't seem very useful. You may have a perfectly valid reason, but it's not obvious what use-case you're serving from this thread so

Re: dv field is too large

2016-07-06 Thread Sheng

Hi Eric, I am refactoring a legacy system. One of the most annoying things is I have to keep the old feature even though it makes little sense. In this case, we have to index a particular data structure which has bunch of fields and each of them is promised to be searchable and search-sortable to

Re: dv field is too large

2016-07-06 Thread Michael McCandless

Maybe you could simply truncate the user-supplied values at 32 KB? Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 5:55 PM, Sheng wrote: > Hi Eric, > > I am refactoring a legacy system. One of the most annoying things is I have > to keep the old feature even though it mak

Re: dv field is too large

2016-07-06 Thread Sheng

To be clear, the "field" is indeed tokenized, which is accompanied with a SortedDocValueField so that it is sortable too. Am I making the wrong assumption here ? On Wednesday, July 6, 2016, Sheng wrote: > Hi Eric, > > I am refactoring a legacy system. One of the most annoying things is I > have

Re: indexing 15 million documents to lucene

2016-07-06 Thread Michael McCandless

Use threads, only commit at the end (and use a near-real-time reader if you want to search at points-in-time), increase IW's indexing buffer. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 4:37 PM, Nomar Morado wrote: > Hi > > I am trying to write 15 million documents (a

Re: Lucene indexes getting deleted after application restart

2016-07-06 Thread Michael McCandless

Call IW.commit on a periodic basis, e.g. every N (!= 1) docs, or every M bytes or something? Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 1:57 PM, Desteny Child wrote: > Hi! > > In my Spring/Lucene application I'm using Lucene IndexWriter, > TrackingIndexWriter, Search

Re: dv field is too large

2016-07-06 Thread Sheng

Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of characters a payload string can carry? On Wednesday, July 6, 2016, Michael McCandless wrote: > Maybe you could simply truncate the user-supplied values at 32 KB? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed

Re: dv field is too large

2016-07-06 Thread Erick Erickson

bq: In this case, we have to index a particular data structure which has bunch of fields and each of them is promised to be searchable and search-sortable to the user If I'm reading this right, you have some structure. You say "each of them is promised to be searchable and search-sortable" It _so

Re: dv field is too large

2016-07-06 Thread Michael McCandless

Yes, or you could get the utf8 bytes yourself client side and check that length. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 6:16 PM, Sheng wrote: > Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of > characters a payload string can carry? > > On We

Re: dv field is too large

2016-07-06 Thread Sheng

You misunderstand. I have many fields, and unfortunately a few of them are quite big, i.e. exceeding the 32k limit. In order to make these "big" fields sortable, they have to be stored as SortedDocValueField. Or that is wrong, one can actually sort the search result by a "big" field without indexin

Re: dv field is too large

2016-07-06 Thread Erick Erickson

Well, if you must sort on a 32K single value (although I think this is extremely silly, _nobody_ will notice that two docs are out of order because they were identical up until the 30,000th character but the 30,001st character isn't sorted correctly), do as Mike suggests and chop it off before send

Re: dv field is too large

2016-07-06 Thread Sheng

I agree. That said, wouldn't it also make sense to clearly point it out by adding the comments to the corresponding classes. This is not the first time I am running into this "magic number" pitfall when using Lucene (e.g., 1024 limit for the token length in early version of Lucene). Generally speak

Hierarchical Facets need duplicated counts

dv field is too large

Re: dv field is too large

Lucene indexes getting deleted after application restart

indexing 15 million documents to lucene

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

Re: indexing 15 million documents to lucene

Re: Lucene indexes getting deleted after application restart

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

Re: dv field is too large

18 matches

Site Navigation

Mail list logo

Footer information