Re: Thrift proxy: Python WholeRowIterator behavior

Josh Elser Sun, 13 Apr 2014 17:26:23 -0700

David,

Not quite sure what you're seeing. Using the "plain" python bindings, Ithink I emulated what you described. I created a table with thefollowing data:

1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2','col3: [] 1397241800 => val3']2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2','col3: [] 1397241808 => val3']

I then modified the start and end Key (really just row) for the Rangewith the following code:


https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py

I got the results I would expect (just row1, just row2, and both row1and row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure --HTH if you have more info.


On 4/13/14, 5:02 PM, David O'Gwynn wrote:

Hi Russ,

I ported it:

def decode_row(cell):
     value = StringIO.StringIO(cell.value)
     numCells = struct.unpack('!i',value.read(4))[0]
     key = cell.row
     for i in range(numCells):
         if value.pos == value.len:
             raise Exception(
                 'Reached the end of the parsable string without'
                 ' having finished unpacking. Likely an error'
                 ' of passing a cell that is not from a'
                 ' WholeRowIterator.'
                 )
         cf = value.read(struct.unpack('!i',value.read(4))[0])
         cq = value.read(struct.unpack('!i',value.read(4))[0])
         cv = value.read(struct.unpack('!i',value.read(4))[0])
         cts = struct.unpack('!q',value.read(8))[0]/1000.
         val = value.read(struct.unpack('!i',value.read(4))[0])

You'll want the check at the beginning of the for loop; I found out
how fast Python can fill my available memory before I put that in.

On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <[email protected]> wrote:

Just curious, David, did you port the logic of WholeRowIterator.decodeRow
over to Python, or is that functionality available somewhere in the
pyaccumulo API and I just missed it?

-Russ


On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <[email protected]> wrote:


1.5.0

Btw, the pyaccumulo library:

https://github.com/accumulo/pyaccumulo

is the basis of my codebase. You should be able to use that to
replicate the issue.

Thanks for looking into this!

On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <[email protected]> wrote:

Ah, gotcha.

That definitely does not seem right. I'll see if I can poke around at
this
today.

Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)


On 4/12/14, 4:13 PM, David O'Gwynn wrote:


Hi Josh,

I guess I misspoke, the Range I'm passing is this:

Range('row0', true, 'row0\0',true)

Keeping in mind that the Thrift interface only exposes one Range
constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
this:

Range( Key('row0',null,...), true, Key('row0\0',null,...), true )

If I scan for all entries (without WholeRowIterator), I get the full
contents of "row0". However, when I add the WholeRowIterator, it
returns nothing.

Furthermore, if I were to pass the following:

Range( Key('row0',null,...), true, Key('row1\0',null,...), true )

not only do I get both "row0" and "row1" without the WRI, I get "row1"
as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
interpreting my Range as having startKeyInclusive set to false, which
is clearly not the case.

Thanks,
David


On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <[email protected]>
wrote:


Hi David,

Looks like you're just mis-using the Range here.

If you create a range that is ["row0", "row0"] as you denote below,
that
will only include Keys that have a rowId of "row0" with an empty
colfam,
colqual, etc. Since you want to use the WholeRowIterator, I can assume
you
want all columns in "row0". As such, ["row0", "row0\0") would be the
best
range to fetch all of the columns in that single row.


On 4/12/2014 1:59 PM, David O'Gwynn wrote:



Hi all,

I'm working with the Python Thrift API for the Accumulo proxy
service,
and I have a bit of odd behavior happening. I'm using Accumulo 1.5
(the standard one from the Accumulo website).

Whenever I use the WholeRowIterator with a Scanner, I cannot
configure
the Range for that Scanner to correctly return the start row for the
Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
row], it returns zero entries. For Range('row0',true,'row1\0',true),
it returns only "row1".

   From the WholeRowIterator documentation, this behavior implies that
the startInclusive bit was set to False, which it clearly wasn't.

I've been able to hack around this issue by setting the start key to

Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)

but I'd really rather understand the correct way of using a Range
object in conjunction with a WholeRowIterator.

Thanks much,

David

Re: Thrift proxy: Python WholeRowIterator behavior

Reply via email to