Hi,

We have an social media application currently using MongoDB to serve documents . We decided to shift it to Accumulo. I am designing the schema and indexing approach but having some difficulties in managing indexes and a few concerns with generating UUID in Accumulo.

UUID : The data is being indexed in MongoDB 24 hours. MongoDB generates a 12 byte UUID sorted on current time and good for multi-user multi-process environment (<time> <Mac add> <process id> <client counter> ) which is perfect. but if I concatenate the time,mac add, process-id, client counter. These are around 28 to 30 characters which means around 60 bytes. And If I store it in reverse order so that the latest document shows on top, the size would be doubled( more than 120 bytes) as described by David Medinets. Is there any way to store this UUID in lesser size or any other efficient way to generate UUID reverse sorted on current time.

Indexing : I need to retrieve documents from index based on some query on fields. I found two approaches to index documents in Accumulo.
(1) Term based reverse indexing and
(2) Document partitioning indexing

As Adam described in this video https://www.youtube.com/watch?v=Ck70G6OuGT4. If I use Document partitioning indexing.

Row                    <partition id>
                               /            \
CF                 <doc>            <index>
                           |                       |
CQ                <UUID>          <Term>
                           |                       |
                      <field>           <UUID>
                           |                        |
                           |                  <Field>
Value            <value>

If I just want to serve documents based on single term query. Would it be better to store <term> in column family so that I can limit on single term in CF. It will reduce the data by a good factor. what can be other pros and cons of this approach? And how should i decide the on partition_Id. If i storing tweets on 3 node cluster?

Regards
Mohit Kaushik

Reply via email to