Re: HBase scan time range, inconsistency

2015-02-27 Thread ramkrishna vasudevan
In your case since the TTL is set to the max and you have a timeRange in your scan it would go with the first case. Every time it would try to fetch only one version ( the latest) for the given record but if the time Range is not falling in the latest then it would skip those cells. But my doubt

Re: HBase scan time range, inconsistency

2015-02-26 Thread Michael Segel
Ok… Silly question time… so just humor me for a second. 1) What do you mean by saying your have a partitioned HBase table? (Regions and partitions are not the same) 2) There’s a question of the isolation level during the scan. What happens when there is a compaction running or there’s RLL

Re: HBase scan time range, inconsistency

2015-02-26 Thread Stephen Durfey
1) What do you mean by saying your have a partitioned HBase table? (Regions and partitions are not the same) By partitions, I just mean logical partitions, using the row key to keep data from separate data sources apart from each other. I think the issue may be resolved now, but it isn't

Re: HBase scan time range, inconsistency

2015-02-25 Thread Stephen Durfey
What's the TTL setting for your table ? Which hbase release are you using ? Was there compaction in between the scans ? Thanks The TTL is set to the max. The HBase version is 0.94.6-cdh4.4.0. I don’t want to say compactions aren’t a factor, but the jobs are short lived (4-5 minutes), and

Re: HBase scan time range, inconsistency

2015-02-25 Thread Stephen Durfey
Are you writing any Deletes? Are you writing any duplicates? No physical deletes are occurring in my data, and there is a very real possibility of duplicates. How is the partitioning done? The key structure would be /partition_id/person_id I'm dealing with clinical data, with a data

Re: HBase scan time range, inconsistency

2015-02-25 Thread Sean Busbey
Are you writing any Deletes? Are you writing any duplicates? How is the partitioning done? What does the entire key structure look like? Are you doing the column filtering with a custom filter or one of the prepackaged ones? On Wed, Feb 25, 2015 at 12:57 PM, Stephen Durfey sjdur...@gmail.com

HBase scan time range, inconsistency

2015-02-24 Thread Stephen Durfey
I have some code that accepts a time range and looks for data written to an HBase table during that range. If anything has been written for that row during that range, the row key is saved off, and sometime later in the pipeline those row keys are used to extract the entire row. I’m testing

Re: HBase scan time range, inconsistency

2015-02-24 Thread Ted Yu
What's the TTL setting for your table ? Which hbase release are you using ? Was there compaction in between the scans ? Thanks On Feb 24, 2015, at 2:32 PM, Stephen Durfey sjdur...@gmail.com wrote: I have some code that accepts a time range and looks for data written to an HBase table

Re: HBase scan time range, inconsistency

2015-02-24 Thread ramkrishna vasudevan
These numbers have varied wildly, from being off by 2-3 between subsequent scans to 40 row increases, followed by a drop of 70 rows. When you say there is a variation in the number of rows retrieved - the 40 rows that got increased - are those rows in the expected time range? Or is the system