Inlined.
vaibhav thapliyal wrote:
Dear all,
I have the following questions on intersecting iterator and partition
ids used in document sharded indexing:
1. Can we run a boolean and query using the current intersecting
iterator on a given range of ids. These ids are a subset of the total
ids stored in the column qualifier field as per the document sharded
indexing format.
The IntersectingIterator is meant to find documents which contain a list
of terms. If you have a set of candidate documents which means that
you've already done the work that the IntersectingIterator would.
If it's not possible with current iterator can I tweak the existing one?
No, I don't think so. The schema that the IntersectingIterator expects
is "row: shardID, colfam: term, colqual: docID". If you have a document
which you _might_ match your terms, you can just fetch each key-value
pair for the document and see if it matches.
Ideally, if you had another index structure which reversed the column
family and qualifier, you could easily verify whether a document
contains all of the terms you're looking for via a column qualifier filter.
Remember, space is cheap.
2. Is the partitioning suggested in document sharded indexing logical or
physical. For eg if I have 30 partition ids do I have to physically
presplit the table based on the partition ids for the and query to run
in the most efficient way so that I have 30 tablets in table?
This is likely a good starting place, but read the below comment.
3. Lastly, Can anybody suggest me the number of partitions for
document sharded indexing. What should I look for when deciding it?
See
http://mail-archives.apache.org/mod_mbox/accumulo-user/201507.mbox/%3C559994BB.3070607%40gmail.com%3E
Thanks
Vaibhav