I’ve never used searchAfter before so looking for some tips and hints.
I understand that I need to maintain a server side cache with the relevant
ScoreDocs, right?
The index is refreshed every couple of minutes. How will that affect the cached
ScoreDocs?
I don’t mind too much having some inco
Good point!
For now I'll leave it normalized. Every search term coming from frontend is
stored and also its counter updated which will help me after some time to
see trends and to decide to change the logic or not.
P.S. Here is the funny part: in Croatian "pišanje" means peeing while
"pisanje" mea
Hi
I would don't store the original value. That's "just" an index. But store the
value of your db identifiers, because I think you'll want it at some point. (I
made the same kind of feature on top of datanucleus)
I use to have tech id in my db. Even more since I started to use jdo jpa some
20
I think it depends how precise you want to make the search. If you
want to enable diacritic-sensitive search in order to avoid confusions
when users actually are able to enter the diacritics, you can index
both ways (ascii-folded and not folded) and not normalize the query
terms. Or you can just fo
ooh
On Fri, Sep 23, 2022 at 11:02 AM Adrien Grand wrote:
>
> We have a TruncateTokenFilter in lucene/analysis/common. :)
>
> On Fri, Sep 23, 2022 at 4:39 PM Michael Sokolov wrote:
>
> > I wonder if it would make sense to provide a TruncationFilter in
> > addition to the LengthFilter. That way lo
We have a TruncateTokenFilter in lucene/analysis/common. :)
On Fri, Sep 23, 2022 at 4:39 PM Michael Sokolov wrote:
> I wonder if it would make sense to provide a TruncationFilter in
> addition to the LengthFilter. That way long tokens in source text
> could be better supported, albeit with some
I wonder if it would make sense to provide a TruncationFilter in
addition to the LengthFilter. That way long tokens in source text
could be better supported, albeit with some confusion if they share
the same very long prefix...
On Fri, Sep 23, 2022 at 9:56 AM Scott Guthery wrote:
>
> Thanks much,
Thanks much, Adrian. I hadn't realized that the size limit was on one
token in the text as opposed to being a limit on the length of the entire
text field. I'm loading patents, so I suspect that the very long word is a
DNA sequence.
Thanks also for your guidance with regard to setting maximums.
On the 2nd question, we do not plan on leveraging this information to
figure out the codec: the codec that should be used to read a segment is
stored separately (also in segment infos).
It is mostly useful for diagnostics purposes. E.g. if we see an interesting
corruption case where checksums matc
Hi Scott,
There is no way to lift this limit. The assumption is that a user would
never type a 32kB keyword in a search bar, so indexing such long keywords
is wasteful. Some tokenizers like StandardTokenizer can be configured to
limit the length of the tokens that they produce, there is also a
Len
Hi Stephane!
Actually, I have excactly that kind of conversion, but I didn't mention as
my mail was long enough whithout it :)
My main concern it should I let Lucene index original keywords or not.
Considering what you wrote, I guess your answer would be to store only
converted values without exot
11 matches
Mail list logo