Re: Reverse Index Timestamp

2012-12-03 Thread Jim Klucar
Sorry for the late reply, but yes, the LSBs of the timestamp are probably
fairly random, but perhaps not uniform enough depending on what is setting
the timestamp. You can just send the row info through a hash function if
you prefer.


On Tue, Nov 27, 2012 at 5:53 PM, Roshan Punnoose rosh...@gmail.com wrote:

 Thanks Jim, do you mean the least significant bits of the timestamp?


 On Tue, Nov 27, 2012 at 4:45 PM, Jim Klucar klu...@gmail.com wrote:

 Roshan,

 Depending on what your cluster setup is and what the resolution of the
 time stamp is you could do something like this to spread the data around:

 timestamp-LSBs-string-reverse timestamp

 Using the LSBs of the timestamp as a uniform hash, then splitting on all
 possible hashes would spread things around a bit. If you do this, then all
 scans must check all hashes for data.




 On Tue, Nov 27, 2012 at 1:25 PM, Keith Turner ke...@deenlo.com wrote:



 On Tue, Nov 27, 2012 at 1:22 PM, Roshan Punnoose rosh...@gmail.comwrote:

 Thanks!

 The fact that you are using a binary tree behind the scenes makes
 perfect sense. Btw, what do you use in the standalone (non native)
 implementation? Does it use a TreeMap?


 When not using native code, ConcurrentSkipListMap is used.




 On Tue, Nov 27, 2012 at 12:57 PM, Keith Turner ke...@deenlo.comwrote:



 On Tue, Nov 27, 2012 at 12:21 PM, Roshan Punnoose 
 rosh...@gmail.comwrote:

 The string would most likely be a fixed set of strings that do not
 change over time.

 My question is if it is bad to use a reverse index timestamp in the
 row id? Will it cause problems with the tablet splitting, compaction, and
 performance if the data is always being sent to the top of the tablet? 
 If I
 define a split as everything prefixed with string, then the ingest will
 go to one tablet, but then I add a reverse timestamp in the row, and that
 would mean I am always copying data to the top of the tablet. Will this
 cause performance issues? Or is it better to append to a tablet?


 I do not think it should matter. Inserts go into a C++ STL map on the
 tablet server if using the nativemap.   I think the implementation of that
 is a balanced binary tree.  So I do not think inserting at the beginning 
 vs
 the end would make difference.  That being said, I do not think I have
 tried this so I do not know if there would be any suprises.  I would be
 interested in hearing about your experiences.




 On Tue, Nov 27, 2012 at 11:51 AM, Keith Turner ke...@deenlo.comwrote:



 Keith

 On Tue, Nov 27, 2012 at 10:41 AM, Roshan Punnoose rosh...@gmail.com
  wrote:

 I want to have a table where the row will consist of
 string-reverse index timestamp. But this means that the data is
 always being prefixed to the beginning of the row (or tablet if the 
 row is
 large). Will this be a problem for compaction or performance?


 Can you tell me more about what string is?  For example is it a
 hash or does it come from the set foo1,foo2,foo3.   How does it
 change over time?  I think the answer to your question depends on what
 string is.



 I don't know if I heard this correctly, but someone once mentioned
 that making the row id the direct timestamp could cause performance 
 issues
 because data is always going to one tablet, but also because there is
 trouble splitting since it always appends to the tablet. Is this true, 
 is
 it similar to what could happen if I am always prefixing to a tablet?


 Yes using a timestamp for a row could cause data from many clients
 to always go to the same tablet, which would be bad for performance on a
 cluster.



 Thanks!
 Roshan











Re: Accumulo VM

2012-12-03 Thread John Vines
Additional note-
Same rules as the last one I did apply. Username/pass is ubuntu/secret. The
Accumulo shell can be run as the accumulo user or root, only the root user
is setup and the root password is secret as well.

John


On Fri, Nov 30, 2012 at 3:25 PM, John Vines vi...@apache.org wrote:

 Uuugghh, failing at internet today. I'm the worst.

 http://dl.dropbox.com/u/538523/Accumulo-UbuntuServer1204.ova





 On Fri, Nov 30, 2012 at 3:17 PM, John Vines vi...@apache.org wrote:

 I recut an Accumulo-1.4.2 VM for people to have a quick start with
 Accumulo. It is configured to use 4GB of memory. I eventually be updating
 it with a new VM which set the appropriate footprint at startup so you can
 work on a smaller sized VM, but for now this is what's available.

 Everything is init.d, you just need to wait a few minutes for everything
 to get up and going. You can view the monitor on port 50095 after starting
 up the VM to check to see if it's all the way up. It is based on Ubuntu
 Server 12.04.1.

 Any questions/comments are appreciated.
 John





Record index within a Table

2012-12-03 Thread michael_taylor
Does Accumulo have the idea of record index within a table?  As an 
example, take a table with the following values:


Row  ColFam  ColQual
1110
1120
1130


As I understand it Accumulo will naturally sort the table in the above 
order (first by row, then by colFam, lastly by colQual).  If I insert 
(1, 1, 25), is there any way for me to get the index of the newly 
inserted value (3 in this case)?


Further more is there anyway to lookup an index by knowing the full 
row:colFam:colQual key (say (1, 1, 30))?


Lastly, is there an easy way to get the total number of rows within a 
table (and the same question for ColFam's within a row and ColQual's 
with a row:colFam pair)?


I've made it through the documentation and scanned through the mailing 
list, but I haven't seen any information on the above (which lends me to 
believe I'm asking for behaviour that Accumulo doesn't natively provide).


Many thanks for any information,

- MT