David,Not quite sure what you're seeing. Using the "plain" python bindings, I think I emulated what you described. I created a table with the following data:
1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 'col3: [] 1397241800 => val3'] 2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 'col3: [] 1397241808 => val3']
I then modified the start and end Key (really just row) for the Range with the following code:
https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.pyI got the results I would expect (just row1, just row2, and both row1 and row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- HTH if you have more info.
On 4/13/14, 5:02 PM, David O'Gwynn wrote:
Hi Russ, I ported it: def decode_row(cell): value = StringIO.StringIO(cell.value) numCells = struct.unpack('!i',value.read(4))[0] key = cell.row for i in range(numCells): if value.pos == value.len: raise Exception( 'Reached the end of the parsable string without' ' having finished unpacking. Likely an error' ' of passing a cell that is not from a' ' WholeRowIterator.' ) cf = value.read(struct.unpack('!i',value.read(4))[0]) cq = value.read(struct.unpack('!i',value.read(4))[0]) cv = value.read(struct.unpack('!i',value.read(4))[0]) cts = struct.unpack('!q',value.read(8))[0]/1000. val = value.read(struct.unpack('!i',value.read(4))[0]) You'll want the check at the beginning of the for loop; I found out how fast Python can fill my available memory before I put that in. On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <[email protected]> wrote:Just curious, David, did you port the logic of WholeRowIterator.decodeRow over to Python, or is that functionality available somewhere in the pyaccumulo API and I just missed it? -Russ On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <[email protected]> wrote:1.5.0 Btw, the pyaccumulo library: https://github.com/accumulo/pyaccumulo is the basis of my codebase. You should be able to use that to replicate the issue. Thanks for looking into this! On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <[email protected]> wrote:Ah, gotcha. That definitely does not seem right. I'll see if I can poke around at this today. Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago) On 4/12/14, 4:13 PM, David O'Gwynn wrote:Hi Josh, I guess I misspoke, the Range I'm passing is this: Range('row0', true, 'row0\0',true) Keeping in mind that the Thrift interface only exposes one Range constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is this: Range( Key('row0',null,...), true, Key('row0\0',null,...), true ) If I scan for all entries (without WholeRowIterator), I get the full contents of "row0". However, when I add the WholeRowIterator, it returns nothing. Furthermore, if I were to pass the following: Range( Key('row0',null,...), true, Key('row1\0',null,...), true ) not only do I get both "row0" and "row1" without the WRI, I get "row1" as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow interpreting my Range as having startKeyInclusive set to false, which is clearly not the case. Thanks, David On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <[email protected]> wrote:Hi David, Looks like you're just mis-using the Range here. If you create a range that is ["row0", "row0"] as you denote below, that will only include Keys that have a rowId of "row0" with an empty colfam, colqual, etc. Since you want to use the WholeRowIterator, I can assume you want all columns in "row0". As such, ["row0", "row0\0") would be the best range to fetch all of the columns in that single row. On 4/12/2014 1:59 PM, David O'Gwynn wrote:Hi all, I'm working with the Python Thrift API for the Accumulo proxy service, and I have a bit of odd behavior happening. I'm using Accumulo 1.5 (the standard one from the Accumulo website). Whenever I use the WholeRowIterator with a Scanner, I cannot configure the Range for that Scanner to correctly return the start row for the Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe row], it returns zero entries. For Range('row0',true,'row1\0',true), it returns only "row1". From the WholeRowIterator documentation, this behavior implies that the startInclusive bit was set to False, which it clearly wasn't. I've been able to hack around this issue by setting the start key to Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False) but I'd really rather understand the correct way of using a Range object in conjunction with a WholeRowIterator. Thanks much, David
