In your case since the TTL is set to the max and you have a timeRange in
your scan it would go with the first case.
Every time it would try to fetch only one version ( the latest) for the
given record but if the time Range is not falling in the latest then it
would skip those cells. But my doubt
Ok…
Silly question time… so just humor me for a second.
1) What do you mean by saying your have a partitioned HBase table? (Regions
and partitions are not the same)
2) There’s a question of the isolation level during the scan. What happens when
there is a compaction running or there’s RLL
1) What do you mean by saying your have a partitioned HBase table?
(Regions and partitions are not the same)
By partitions, I just mean logical partitions, using the row key to keep
data from separate data sources apart from each other.
I think the issue may be resolved now, but it isn't
What's the TTL setting for your table ?
Which hbase release are you using ?
Was there compaction in between the scans ?
Thanks
The TTL is set to the max. The HBase version is 0.94.6-cdh4.4.0. I don’t
want to say compactions aren’t a factor, but the jobs are short lived (4-5
minutes), and
Are you writing any Deletes? Are you writing any duplicates?
No physical deletes are occurring in my data, and there is a very real
possibility of duplicates.
How is the partitioning done?
The key structure would be /partition_id/person_id I'm dealing with
clinical data, with a data
Are you writing any Deletes? Are you writing any duplicates?
How is the partitioning done?
What does the entire key structure look like?
Are you doing the column filtering with a custom filter or one of the
prepackaged ones?
On Wed, Feb 25, 2015 at 12:57 PM, Stephen Durfey sjdur...@gmail.com
I have some code that accepts a time range and looks for data written to an
HBase table during that range. If anything has been written for that row during
that range, the row key is saved off, and sometime later in the pipeline those
row keys are used to extract the entire row. I’m testing
What's the TTL setting for your table ?
Which hbase release are you using ?
Was there compaction in between the scans ?
Thanks
On Feb 24, 2015, at 2:32 PM, Stephen Durfey sjdur...@gmail.com wrote:
I have some code that accepts a time range and looks for data written to an
HBase table
These numbers have varied wildly, from being off by 2-3 between
subsequent scans to 40 row increases, followed by a drop of 70 rows.
When you say there is a variation in the number of rows retrieved - the 40
rows that got increased - are those rows in the expected time range? Or is
the system