Yes, as you suggested simply wrapping up postings with LZ4 could not be
best-fit for all cases. Byte-Pair Encoding looks very promising
I accidentally stumbled upon this JIRA and found it was abandoned mid-way.
Thanks for sharing the details
--
Ravi
On Fri, Jul 3, 2015 at 5:46 PM, Adrien Grand
We try to make the default postings format a good default for most
use-cases and it's unclear to me whether trading speed of multi-term
queries for compression of the terms dictionary would be a better
trade-off for most users. I think this idea needs more iterations, for
instance on this issue I e
An unrelated question…
I came across a JIRA issue where you tried compressing Terms-Dictionary
just before writing and achieved reduction in storage space…
https://issues.apache.org/jira/browse/LUCENE-4702
Was it abandoned because of Terms-Dict intensive queries like Fuzzy etc..
din't behave wel
Thanks Adrien…
Works like a charm!!!
On Wed, Jul 1, 2015 at 10:22 PM, Adrien Grand wrote:
> Hi Ravikumar,
>
> You need to run a BooleanQuery with two clauses:
> - a must clause that matches all parent documents
> - a must_not clause that matches all parents that have children
>
> Building thi
Hi Ravikumar,
You need to run a BooleanQuery with two clauses:
- a must clause that matches all parent documents
- a must_not clause that matches all parents that have children
Building this second clause can be done easily with a
ToParentBlockJoinQuery around a child query that matches all you
We have organised our segments in parent-child blocks and wish to
periodically delete parent-documents that don't have any children to
reclaim space via IndexWriter.deleteDocuments(Query)…
Is it possible to draft a Query that identifies such parents? Any help is
much appreciated…
--
Ravi