Hello everyone,
we are using hierarchical facets (from
org.apache.lucene.facet.taxonomy), in our case 1 entry can have several
values referencing more leaves in the hierarchical facet.
At search time we are noticing that if we search for exactly 1 entry we
have count = 1 in the hierarchical facet
Hi,
I am getting an IAE indicating one of the SortedDocValueField is too large,
> 32k
I googled a bit, and it seems like #Lucene-4583 has addressed this issue in
4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
misunderstand anything ?
Thanks,
I believe only binary DVs can be larger than 32K bytes.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 10:31 AM, Sheng wrote:
> Hi,
>
> I am getting an IAE indicating one of the SortedDocValueField is too large,
> > 32k
>
> I googled a bit, and it seems like #Lucene-4583
Hi!
In my Spring/Lucene application I'm using Lucene IndexWriter,
TrackingIndexWriter, SearcherManager and ControlledRealTimeReopenThread.
I use open mode - IndexWriterConfig.OpenMode.CREATE_OR_APPEND.
Right now I'm trying to index a thousands of a documents. For this purpose
I have added Apache
Hi
I am trying to write 15 million documents (and maybe more) to lucene for
indexing.
I would try to call writer.commit at some #/byte size of documents.
The entire generated lucene files are about 1 GB total.
My timing is about ~ 15-20 mins.
I don't know if there are other configurations i ca
Mike - Thanks for the prompt response. Is there a way to bypass this
constraint for SortedDocValueField ? Or we have to live with it, meaning no
fix even in future release?
On Wednesday, July 6, 2016, Michael McCandless
wrote:
> I believe only binary DVs can be larger than 32K bytes.
>
> Mike Mc
Is this an "XY" problem? Meaning, why do you need DV fields larger than 32K?
You can't search it as text as it's not tokenized. Faceting and sorting by a 32K
field doesn't seem very useful. You may have a perfectly valid reason, but it's
not obvious what use-case you're serving from this thread so
Hi Eric,
I am refactoring a legacy system. One of the most annoying things is I have
to keep the old feature even though it makes little sense. In this case, we
have to index a particular data structure which has bunch of fields and
each of them is promised to be searchable and search-sortable to
Maybe you could simply truncate the user-supplied values at 32 KB?
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 5:55 PM, Sheng wrote:
> Hi Eric,
>
> I am refactoring a legacy system. One of the most annoying things is I have
> to keep the old feature even though it mak
To be clear, the "field" is indeed tokenized, which is accompanied with a
SortedDocValueField so that it is sortable too. Am I making the wrong
assumption here ?
On Wednesday, July 6, 2016, Sheng wrote:
> Hi Eric,
>
> I am refactoring a legacy system. One of the most annoying things is I
> have
Use threads, only commit at the end (and use a near-real-time reader if you
want to search at points-in-time), increase IW's indexing buffer.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 4:37 PM, Nomar Morado wrote:
> Hi
>
> I am trying to write 15 million documents (a
Call IW.commit on a periodic basis, e.g. every N (!= 1) docs, or every M
bytes or something?
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 1:57 PM, Desteny Child wrote:
> Hi!
>
> In my Spring/Lucene application I'm using Lucene IndexWriter,
> TrackingIndexWriter, Search
Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of
characters a payload string can carry?
On Wednesday, July 6, 2016, Michael McCandless
wrote:
> Maybe you could simply truncate the user-supplied values at 32 KB?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed
bq: In this case, we
have to index a particular data structure which has bunch of fields and
each of them is promised to be searchable and search-sortable to the user
If I'm reading this right, you have some structure. You say
"each of them is promised to be searchable and search-sortable"
It _so
Yes, or you could get the utf8 bytes yourself client side and check that
length.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 6:16 PM, Sheng wrote:
> Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of
> characters a payload string can carry?
>
> On We
You misunderstand. I have many fields, and unfortunately a few of them are
quite big, i.e. exceeding the 32k limit. In order to make these "big"
fields sortable, they have to be stored as SortedDocValueField. Or that is
wrong, one can actually sort the search result by a "big" field without
indexin
Well, if you must sort on a 32K single value (although I think this is
extremely silly, _nobody_ will notice that two docs are out of order
because they were identical up until the 30,000th character but the
30,001st character isn't sorted correctly), do as Mike suggests and
chop it off before send
I agree. That said, wouldn't it also make sense to clearly point it out by
adding the comments to the corresponding classes. This is not the first
time I am running into this "magic number" pitfall when using Lucene
(e.g., 1024
limit for the token length in early version of Lucene). Generally speak
18 matches
Mail list logo