Re: Questions on intersecting iterator and partition ids

Josh Elser Mon, 13 Jul 2015 11:03:19 -0700

Inlined.

vaibhav thapliyal wrote:

Dear all,


I have the following questions on intersecting iterator and partition
ids used in document sharded indexing:

1. Can we run a boolean and query using the current intersecting
iterator on a given range of ids. These ids are a subset of the total
ids stored in the column qualifier field as per the document sharded
indexing format.

The IntersectingIterator is meant to find documents which contain a listof terms. If you have a set of candidate documents which means thatyou've already done the work that the IntersectingIterator would.

If it's not possible with current iterator can I tweak the existing one?

No, I don't think so. The schema that the IntersectingIterator expectsis "row: shardID, colfam: term, colqual: docID". If you have a documentwhich you _might_ match your terms, you can just fetch each key-valuepair for the document and see if it matches.

Ideally, if you had another index structure which reversed the columnfamily and qualifier, you could easily verify whether a documentcontains all of the terms you're looking for via a column qualifier filter.


Remember, space is cheap.

2. Is the partitioning suggested in document sharded indexing logical or
physical. For eg if I have 30 partition ids do I have to physically
presplit the table based on the partition ids for the and query to run
in the most efficient way so that I have 30 tablets in table?


This is likely a good starting place, but read the below comment.

3.  Lastly,  Can anybody suggest me the number of partitions for
document sharded indexing. What should I look for when deciding it?

Seehttp://mail-archives.apache.org/mod_mbox/accumulo-user/201507.mbox/%3C559994BB.3070607%40gmail.com%3E

Thanks
Vaibhav

Re: Questions on intersecting iterator and partition ids

Reply via email to