To avoid wildcard queries, you can write a TokenFilter that will
create both tokens "ADJ" and "ADJ:brown" in same position.
so you can use you index for both lookups without doing wildcard.
On Tue, Aug 7, 2012 at 12:31 PM, Carsten Schnober
wrote:
> Hi Danil,
>
>>> Just transform your input like
I mean "ADJ:brown" as a token and only the as payload, since
you probably only use it for some scoring/postprocessing not the
actual matching.
You can even write a filter that will emit both tokens "ADJ" and
"AJD:brown" on same position (so you'll be able to do phrase queries),
and still maintain
Hi Danil,
>> Just transform your input like "brown fox" into "ADJ:brown|> payload> NOUN:fox|"
>
> I understand that this denotes "ADJ" and "NOUN" to be interpreted as the
> actual token and "brown" and "fox" as payloads (followed by payload>), right?
Sorry for replying to myself, but I've reali
Am 07.08.2012 10:20, schrieb Danil ŢORIN:
Hi Danil,
> If you do intersection (not join), maybe it make sense to put every
> thing into 1 index?
Just a note on that: my application performs intersections and joins
(unions) on the results, depending on the query. So the index structure
has to be r
If you do intersection (not join), maybe it make sense to put every
thing into 1 index?
Just transform your input like "brown fox" into "ADJ:brown| NOUN:fox|"
Write a custom tokenizer, some filters and that's it.
Of course I'm not aware of all the details, so my solution might not
be applicable
Am 06.08.2012 20:29, schrieb Mike Sokolov:
Hi Mike,
> There was some interesting work done on optimizing queries including
> very common words (stop words) that I think overlaps with your problem.
> See this blog post
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-wo
There was some interesting work done on optimizing queries including
very common words (stop words) that I think overlaps with your problem.
See this blog post
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
from the Hathi Trust.
The upshot in a nutshel
Am 31.07.2012 12:10, schrieb Ian Lea:
Hi Ian,
> Lucene 4.0 allows you to use custom codecs and there may be one that
> would be better for this sort of data, or you could write one.
>
> In your tests is it the searching that is slow or are you reading lots
> of data for lots of docs? The latter
Lucene 4.0 allows you to use custom codecs and there may be one that
would be better for this sort of data, or you could write one.
In your tests is it the searching that is slow or are you reading lots
of data for lots of docs? The latter is always likely to be slow.
General performance advice a