Fulltext matching

Courtney Robinson Mon, 03 Sep 2018 12:35:12 -0700

Hi,

We've got Ignite in production and decided to start using some fulltext
matching as well.
I've investigated and can't figure out why my queries are not matching.


I construct a query entity e.g new QueryEntity(keyClass, valueClass) and in
debug I can see it generates a list of fields
e.g. a, b, c.a, c.b
I then expected to be able to match on those fields that are marked as
indexed. Everything is annotation driven. The appropriate fields have been
annotated and appear to be detected as such
when I inspect what gets put into the QueryEntityDescriptor. i.e. all
expected indices and indexed fields are present.

In LuceneGridIndex I see that the lucene document generated as fields a,b
(c.a and c.b are not included). Now a couple of questions arise:

1. Is there a way to get Ignite to index the nested fields as well so that
c.a and c.b end up in the doc?

2. If you use a composite object as a key, its fields are extracted into
the top level so if you have Key.a and Value.a you cannot index both since
Key.a becomes a which collides with Value.a - can this be changed, are
there any known reasons why it couldn't be (i.e. I'm happy to send a PR
doing so - but I suspect the answer to this is linked to the answer to the
first question)

3. The docs simply say you can use lucene syntax, I presume it means the
syntax that appears in
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html is all valid -
checking the code that appears to be case as it does
a MultiFieldQueryParser in GridLuceneIndex. However, when I try to run a
query such as a:<my-text> - none of the indexed documents match. In debug
mode I've enabled parser.setAllowLeadingWildcard(true); and if I do a
simple searcher.search * I get back the list of expected documents.

What's even more odd is I tried querying each of the 6 indexed fields as
found in idxdFields in GridLuceneIndex and 1 of them match. The other
values are being typed exactly but also doing wild cards or other free text
forms do not match.

4. I couldn't see a way to provide a custom GridLuceneIndex, I found the
two cases where it's constructed in the code base and doesn't look like I
can inject instances. Is it ok to construct and use a custom
GridLuceneDirectory/IndexWriter/Searcher and so on in the same way
GridLuceneIndex does it so I can do a custom IndexingSpi to change how
indexing happens?
There are a number of things I'd like to customise and from looking at the
current impl. these things aren't injectable, I guess it's not considered a
prime use case maybe.

Yeah, the analyzer and a number of things would be handy to change. Ideally
also want to customise how a field is indexed e.g. to be able to do term
matches with lucene queries

Looking at this impl as well it passes Integer.MAX_VALUE and pulls back all
matches. That'll surely kill our nodes for some of the use cases we're
considering.
I'd also like to implement paging, the searcher API has a nice option to
pass through a last doc it can continue from to potentially implement
something like deep-paging.

5. If I were to do a custom IndexingSpi to make all of this happen, how do
I get additional parameters through so that I could have paging params
passed

Ideally I could customise the indexing, searching and paging through
standard Ignite means but I can't find any means of doing that in the
current code and short of doing a custom IndexingSpi I think I've gone as
far as I can debugging and could do with a few pointers of how to go about
this.

FYI, SQL isn't a great option for this part of the product, we're
generating and compiling Java classes at runtime and generating SQL to do
the queries is an order of magnitude more work than indexing the relatively
few fields we need and then searching but off the bat the paging would be
an issue as there can be several million matches to a query. Can't have
Ignite pulling all of those into memory.

Thanks in advance

Courtney

Fulltext matching

Reply via email to