Often it'll be a hash of the document mod the number of bins you're using. The hash should be "good" in the sense that it uniquely identifies the document. It can be as simple as some unique field in the document or just a hash (like murmur) of the whole document.
On Saturday, February 6, 2016, Jamie Johnson <jej2...@gmail.com> wrote: > Just found this excellent write up that explains a bit. > > https://www.slideshare.net/mobile/acordova00/text-indexing-in-accumulo > On Feb 6, 2016 8:52 AM, "Jamie Johnson" <jej2...@gmail.com > <javascript:_e(%7B%7D,'cvml','jej2...@gmail.com');>> wrote: > >> Reading the examples for table design I've come across a question >> associated with the document partitioned index, specifically what is >> typically chosen as the BinId or maybe more appropriately what factors >> should influence what is chosen as the BinId and what impact do they have? >> >