Ok, so I went back to my IPython console to rerun my scan to prove to myself that I wasn't crazy. Well, I ran it and it worked like you just said, contra to my original point. Started to think I was on the crazy train.
Then I remembered that the table I'd been working on, I'd removed the versioning iterator for some other tests. Then I started checking the priorities of my iterators. Turns out, the issue was the priority of my WRI. If the versioning iterator is attached, and the WRI's priority is <= the versioning iterator's priority, then you see this behavior (the first row of a WRI scan gets dropped). If you change the priority for the WRI in your code to <=20, then you'll see it, Josh. Still not sure why this would be the case; seems an odd behavior. Anyway, thanks for taking the time to help me suss this out. :-) On Sun, Apr 13, 2014 at 8:24 PM, Josh Elser <[email protected]> wrote: > David, > > Not quite sure what you're seeing. Using the "plain" python bindings, I > think I emulated what you described. I created a table with the following > data: > > 1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 'col3: > [] 1397241800 => val3'] > 2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 'col3: > [] 1397241808 => val3'] > > I then modified the start and end Key (really just row) for the Range with > the following code: > > https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py > > I got the results I would expect (just row1, just row2, and both row1 and > row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- HTH if > you have more info. > > > On 4/13/14, 5:02 PM, David O'Gwynn wrote: >> >> Hi Russ, >> >> I ported it: >> >> def decode_row(cell): >> value = StringIO.StringIO(cell.value) >> numCells = struct.unpack('!i',value.read(4))[0] >> key = cell.row >> for i in range(numCells): >> if value.pos == value.len: >> raise Exception( >> 'Reached the end of the parsable string without' >> ' having finished unpacking. Likely an error' >> ' of passing a cell that is not from a' >> ' WholeRowIterator.' >> ) >> cf = value.read(struct.unpack('!i',value.read(4))[0]) >> cq = value.read(struct.unpack('!i',value.read(4))[0]) >> cv = value.read(struct.unpack('!i',value.read(4))[0]) >> cts = struct.unpack('!q',value.read(8))[0]/1000. >> val = value.read(struct.unpack('!i',value.read(4))[0]) >> >> You'll want the check at the beginning of the for loop; I found out >> how fast Python can fill my available memory before I put that in. >> >> On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <[email protected]> >> wrote: >>> >>> Just curious, David, did you port the logic of WholeRowIterator.decodeRow >>> over to Python, or is that functionality available somewhere in the >>> pyaccumulo API and I just missed it? >>> >>> -Russ >>> >>> >>> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <[email protected]> wrote: >>>> >>>> >>>> 1.5.0 >>>> >>>> Btw, the pyaccumulo library: >>>> >>>> https://github.com/accumulo/pyaccumulo >>>> >>>> is the basis of my codebase. You should be able to use that to >>>> replicate the issue. >>>> >>>> Thanks for looking into this! >>>> >>>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <[email protected]> >>>> wrote: >>>>> >>>>> Ah, gotcha. >>>>> >>>>> That definitely does not seem right. I'll see if I can poke around at >>>>> this >>>>> today. >>>>> >>>>> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago) >>>>> >>>>> >>>>> On 4/12/14, 4:13 PM, David O'Gwynn wrote: >>>>>> >>>>>> >>>>>> Hi Josh, >>>>>> >>>>>> I guess I misspoke, the Range I'm passing is this: >>>>>> >>>>>> Range('row0', true, 'row0\0',true) >>>>>> >>>>>> Keeping in mind that the Thrift interface only exposes one Range >>>>>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is >>>>>> this: >>>>>> >>>>>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true ) >>>>>> >>>>>> If I scan for all entries (without WholeRowIterator), I get the full >>>>>> contents of "row0". However, when I add the WholeRowIterator, it >>>>>> returns nothing. >>>>>> >>>>>> Furthermore, if I were to pass the following: >>>>>> >>>>>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true ) >>>>>> >>>>>> not only do I get both "row0" and "row1" without the WRI, I get "row1" >>>>>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow >>>>>> interpreting my Range as having startKeyInclusive set to false, which >>>>>> is clearly not the case. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> Looks like you're just mis-using the Range here. >>>>>>> >>>>>>> If you create a range that is ["row0", "row0"] as you denote below, >>>>>>> that >>>>>>> will only include Keys that have a rowId of "row0" with an empty >>>>>>> colfam, >>>>>>> colqual, etc. Since you want to use the WholeRowIterator, I can >>>>>>> assume >>>>>>> you >>>>>>> want all columns in "row0". As such, ["row0", "row0\0") would be the >>>>>>> best >>>>>>> range to fetch all of the columns in that single row. >>>>>>> >>>>>>> >>>>>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I'm working with the Python Thrift API for the Accumulo proxy >>>>>>>> service, >>>>>>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5 >>>>>>>> (the standard one from the Accumulo website). >>>>>>>> >>>>>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot >>>>>>>> configure >>>>>>>> the Range for that Scanner to correctly return the start row for the >>>>>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe >>>>>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true), >>>>>>>> it returns only "row1". >>>>>>>> >>>>>>>> From the WholeRowIterator documentation, this behavior implies >>>>>>>> that >>>>>>>> the startInclusive bit was set to False, which it clearly wasn't. >>>>>>>> >>>>>>>> I've been able to hack around this issue by setting the start key to >>>>>>>> >>>>>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False) >>>>>>>> >>>>>>>> but I'd really rather understand the correct way of using a Range >>>>>>>> object in conjunction with a WholeRowIterator. >>>>>>>> >>>>>>>> Thanks much, >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>> >>>>> >>> >>> >
