Hi,
This article is still very correct! Use the defaults of TieredMergePolicy,
nothering more to say.
The problems only start once you optimize/forceMerge for the first time and
still update it afterwards. Because then your index is no longer structured in
an optimal way and the huge segment w
Nice!
On Tue, Jun 13, 2017 at 11:12 PM Tom Hirschfeld
wrote:
> Hey All,
>
> I was able to solve my problem a few weeks ago and wanted to update you
> all. The root issue was with the caching mechanism in
> "makedistancevaluesource" method in the lucene spatial module, it appears
> that documents
2017-06-14 19:20 GMT+02:00 Uwe Schindler :
> You also lose the ability to parallelize searches with an Executor on
> IndexSearcher!
How can you say that? Isn't true that multiple reader can access
concurrently on the same segment?
Hello!
I'm not particularly familiar with lucene's search api (as I've been using
the library mostly as a dumb index rather than a search engine), but I am
almost certain that, using its payload capabilities, it would be trivial to
implement a regular chunker to look for patterns in sequences of p
Hello,
We use POS-tagging too, and encode them as payload bitsets for scoring, which
is, as far as is know, the only possibility with payloads.
So, instead of encoding them as payloads, why not index your treebanks POS-tags
as tokens on the same position, like synonyms. If you do that, you can
Markus - how are you encoding payloads as bitsets and use them for scoring?
Curious to see how folks are leveraging them.
Erik
> On Jun 14, 2017, at 4:45 PM, Markus Jelsma wrote:
>
> Hello,
>
> We use POS-tagging too, and encode them as payload bitsets for scoring, which
> is, as f
Hello Erik,
Using Solr, or actually more parts are Lucene, we have a CharFilter adding
treebank tags to whitespace delimited word using a delimiter, further on we get
these tokens with the delimiter and the POS-tag. It won't work with some
Tokenizers and put it before WDF, it'll split as you kn
Markus:
I don't believe that payloads are limited in size at all. LUCENE-7705
was done in part because there _was_ a hard-coded 256 limit for some
of the tokenizers. The Payload (at least recent versions) just have
some bytes after them, and (with LUCENE-7705) can be arbitrarily long.
Of course i
Hello Erick, no worries, i recognize you two.
I will take a look at your references tomorrow. Although i am still fine with
eight bits, i cannot spare any more but one. If Lucene allows us to pass longer
bitsets to the BytesRef, it would be awesome and easy to encode.
Thanks!
Markus
-Orig
I think it'd be interesting to also investigate using TypeAttribute [1]
together with TypeTokenFilter [2].
Regards,
Tommaso
[1] :
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/analysis/tokenattributes/TypeAttribute.html
[2] :
https://lucene.apache.org/core/6_5_0/analyzers-common/org
Hello Tommaso,
These don't propagate to search right, but can be used in the analyzer chain!
This would be a better solution than using delimiters on words. The only
problem is that TypeFilter only works on Tokens, after the tokenizer. The bonus
of a CharFilter is that is sees the whole text, s
In the past I have tried IndexSearcher with an ExecutorService to
parallelize searches on multiple segments on a SSD disk. That was with
Lucene 4.9. Unfortunatelly the searches became slower with various number
of threads in the pool, and much slower with 1 thread. There was some
overhead with tha
Hi,
This was meant that a *single* search can be parallelized. Of course you can do
multiple searches in parallel. But that is completely unrelated to the question.
Uwe
-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
13 matches
Mail list logo