On Mon, Dec 9, 2013 at 4:47 PM, Niels Basjes <[email protected]> wrote:
> > Why has it been designed/implemented like this? > What is the logic behind this model? > Hey Niels: It is probably fair to call this an instance of implementation leaking and polluted our data model. We should fix it. Currently, deletes always sort before all other types when all other coordinates are the same (same row, same column family, same timestamp, etc.) IIRC, it was done this way along time ago because it made delete reasoning 'easier'. This forced sort ordering is why you see the behavior you note in your shell experiments. Our Sergey recently has suggested we undo our factoring in 'type' when sorting KeyValues/Cells; rather, we would distinguish pivoting on sequence id when all else matches. Awkwardly, we'd then have to let user add sequence id when querying a specific Cell. This would not be easy to do. Sequence id is an internal, amorphous notion at the moment -- it exists while KeyValues are in flight but is (mostly) dropped after KeyValues persist to hfiles -- but it looks like it is fast becoming more tangible given some issues that arise around WAL replay at recovery time and in corner cases replicating. What is your thinking on this Niels? Its current implementation interrupts your ability building an app on hbase? Thanks, St.Ack
