[jira] Created: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)
Add ability to extract a specified list of versions of a column in a single 
roundtrip
-

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan


In one of the use cases we were looking at, each row contains a single column, 
but with several versions (e.g., each version representing an event in a log), 
and we want to be able to extract specific set of versions from the row in a 
single round-trip.

Currently, on a Get, one can retrieve a specific version of a column using 
setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a 
set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan reassigned HBASE-2793:


Assignee: Kannan Muthukkaruppan

 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)
ROWCOL bloom filter not used if multiple columns within same family are 
requested in a Get
--

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan


Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():

{code}
switch(bloomFilterType) {
  case ROW:
key = row;
break;
  case ROWCOL:
if (columns.size() == 1) {
  byte[] col = columns.first();
  key = Bytes.add(row, col);
  break;
}
//$FALL-THROUGH$
  default:
return true;
}
{code}

If columns.size  1, then we currently don't take advantage of the bloom 
filter.  We should optimize this to check bloom for each of columns and if none 
of the columns are present in the bloom avoid opening the file.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882820#action_12882820
 ] 

Kannan Muthukkaruppan commented on HBASE-2794:
--

Perhaps a simple starter task for someone interested.

 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2010-06-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-2794:
---

Tags: noob

 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882854#action_12882854
 ] 

stack commented on HBASE-2793:
--

What you thinking Kannan?  Passing a filter?

 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882867#action_12882867
 ] 

stack commented on HBASE-50:


Thinking on it, Li, maybe its best if you work up in github and just log here 
when you do big pushes to your github repro? That way you are in charge of it 
and not dependent on laggard hbase committers getting your work into the branch?

 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

2010-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2707:
-

Attachment: 2707-0.20.txt

Backport to 0.20.  Please review.  I'd like this to go into 0.20 since it hoses 
cluster if it ever happens.

 Can't recover from a dead ROOT server if any exceptions happens during log 
 splitting
 

 Key: HBASE-2707
 URL: https://issues.apache.org/jira/browse/HBASE-2707
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 2707-0.20.txt, 2707-test.txt, HBASE-2707.patch


 There's an almost easy way to get stuck after a RS holding ROOT dies, usually 
 from a GC-like event. It happens frequently to my TestReplication in 
 HBASE-2223.
 Some logs:
 {code}
 2010-06-10 11:35:52,090 INFO  [master] wal.HLog(1175): Spliting is done. 
 Removing old log dir 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 2010-06-10 11:35:52,095 WARN  [master] 
 master.RegionServerOperationQueue(183): Failed processing: 
 ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed 
 todo queue
 java.io.IOException: Cannot delete: 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179)
 at 
 org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298)
 at 
 org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456)
 Caused by: java.io.IOException: java.io.IOException: 
 /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty
 2010-06-10 11:35:52,097 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,098 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,523 INFO  [main.serverMonitor] 
 master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average 
 load 14.0[10.10.1.63,55846,1276194933831]
 2010-06-10 11:35:54,099 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:55,101 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 {code}
 The last lines are my own debug. Since we don't process the delayed todo if 
 ROOT isn't online, we'll never reassign the regions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

2010-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2707:
-

Fix Version/s: 0.20.6

Moving into 0.20.6.

 Can't recover from a dead ROOT server if any exceptions happens during log 
 splitting
 

 Key: HBASE-2707
 URL: https://issues.apache.org/jira/browse/HBASE-2707
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Blocker
 Fix For: 0.20.6, 0.21.0

 Attachments: 2707-0.20.txt, 2707-test.txt, HBASE-2707.patch


 There's an almost easy way to get stuck after a RS holding ROOT dies, usually 
 from a GC-like event. It happens frequently to my TestReplication in 
 HBASE-2223.
 Some logs:
 {code}
 2010-06-10 11:35:52,090 INFO  [master] wal.HLog(1175): Spliting is done. 
 Removing old log dir 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 2010-06-10 11:35:52,095 WARN  [master] 
 master.RegionServerOperationQueue(183): Failed processing: 
 ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed 
 todo queue
 java.io.IOException: Cannot delete: 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179)
 at 
 org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298)
 at 
 org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456)
 Caused by: java.io.IOException: java.io.IOException: 
 /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty
 2010-06-10 11:35:52,097 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,098 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,523 INFO  [main.serverMonitor] 
 master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average 
 load 14.0[10.10.1.63,55846,1276194933831]
 2010-06-10 11:35:54,099 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:55,101 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 {code}
 The last lines are my own debug. Since we don't process the delayed todo if 
 ROOT isn't online, we'll never reassign the regions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2795) On review HBASE-2707 has problem in that we'll get stuck in the delay queue and never come out

2010-06-26 Thread stack (JIRA)
On review HBASE-2707 has problem in that we'll get stuck in the delay queue and 
never come out
--

 Key: HBASE-2795
 URL: https://issues.apache.org/jira/browse/HBASE-2795
 Project: HBase
  Issue Type: Bug
Reporter: stack


I committed the hbase-2707 patch yesterday but on second thoughts, it has a 
flaw in that if nothing in the todo queue, we then poll the delayedtodo queue.  
If we fall into the latter and it has not elements, then we'll never come out; 
there are no notifyalls going on to wake us up.  Patch coming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2796) Backport of 2707 to 0.20 branch

2010-06-26 Thread stack (JIRA)
Backport of 2707 to 0.20 branch
---

 Key: HBASE-2796
 URL: https://issues.apache.org/jira/browse/HBASE-2796
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.20.6


Backport the hbase-2707 fix to the 0.20 branch.  If 2707 happens, it hoses 
cluster...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

2010-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2707:
-

Fix Version/s: (was: 0.20.6)

Taking out of 0.20.6.  Open separate issue.  The attached 0.20 patch is not 
enough.  The change would be more major than this patch presumes.

 Can't recover from a dead ROOT server if any exceptions happens during log 
 splitting
 

 Key: HBASE-2707
 URL: https://issues.apache.org/jira/browse/HBASE-2707
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 2707-0.20.txt, 2707-test.txt, HBASE-2707.patch


 There's an almost easy way to get stuck after a RS holding ROOT dies, usually 
 from a GC-like event. It happens frequently to my TestReplication in 
 HBASE-2223.
 Some logs:
 {code}
 2010-06-10 11:35:52,090 INFO  [master] wal.HLog(1175): Spliting is done. 
 Removing old log dir 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 2010-06-10 11:35:52,095 WARN  [master] 
 master.RegionServerOperationQueue(183): Failed processing: 
 ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed 
 todo queue
 java.io.IOException: Cannot delete: 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179)
 at 
 org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298)
 at 
 org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456)
 Caused by: java.io.IOException: java.io.IOException: 
 /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty
 2010-06-10 11:35:52,097 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,098 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,523 INFO  [main.serverMonitor] 
 master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average 
 load 14.0[10.10.1.63,55846,1276194933831]
 2010-06-10 11:35:54,099 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:55,101 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 {code}
 The last lines are my own debug. Since we don't process the delayed todo if 
 ROOT isn't online, we'll never reassign the regions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882878#action_12882878
 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
--

Still need to look at the code some more. Thinking aloud some options seem to 
be:

(Note as background: that we are planning to add HBASE-2265. So, it would be 
nice if the fix for this issue also takes advantage of that optimization and 
avoids a full row scan).

#1) Filter object with a list of versions you are interested in. But it seems 
like in this approach, you'll end up doing a full scan-- and check against the 
filter for each row. There wouldn't be a way to early exit.

#2) Variant of #1. Additionally compute the min/max version from the passed in 
set of versions; use the code setTimeRange() to trim down the set of columns we 
look at; and apply the filter against those columns. Still not a great approach 
is versions passed are spread out too much.

#3) Do N point lookups (or 1 column scans), one version at a time (all in the 
same server roundtrip of course). I think it is still important to preserve 
row-level consistency-- i.e. we should do a consistent read of the all the 
versions within a row. The stuff Ryan has done should probably make it easy. 
But I don't know this too well yet.

#4) Implement Batch Get[] API. The app would need to pass a List of Get 
objects, all for the same row, and use setTimeStamp() to set the version 
explicitly in each Get object. The trouble though is that the general case of 
the Batch Get[] API doesn't have to support a consistency read across all Gets 
in a batch; but for this case a consistent read would be the desired semantics.

I think #3 might be best overall. If there are 1 versions of a cell, and 
you are interested in version 1 and 1 ones, then point lookups will be as 
good as it gets-- and should fetch just the minimal blocks needed.  If the 
versions happen to be on same block, even better-- the blocks should be warm in 
the LRU cache. The case where this approach might not be as CPU efficient is if 
the versions are fairly densely packed together, and a range scan (#2) might 
have worked better. But for the case the app should probably be using 
setTimeRange() API instead.




 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882880#action_12882880
 ] 

ryan rawson commented on HBASE-2793:


#2 wontbe so bad... filters are pretty deep and will be just as efficient as
hacking scan query Matcher I think.

On Jun 26, 2010 11:45 AM, Kannan Muthukkaruppan (JIRA) j...@apache.org
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882878#action_12882878]
to be:
be nice if the fix for this issue also takes advantage of that optimization
and avoids a full row scan).
seems like in this approach, you'll end up doing a full scan-- and check
against the filter for each row. There wouldn't be a way to early exit.
passed in set of versions; use the code setTimeRange() to trim down the set
of columns we look at; and apply the filter against those columns. Still not
a great approach is versions passed are spread out too much.
the same server roundtrip of course). I think it is still important to
preserve row-level consistency-- i.e. we should do a consistent read of the
all the versions within a row. The stuff Ryan has done should probably make
it easy. But I don't know this too well yet.
objects, all for the same row, and use setTimeStamp() to set the version
explicitly in each Get object. The trouble though is that the general case
of the Batch Get[] API doesn't have to support a consistency read across all
Gets in a batch; but for this case a consistent read would be the desired
semantics.
and you are interested in version 1 and 1 ones, then point lookups will
be as good as it gets-- and should fetch just the minimal blocks needed. If
the versions happen to be on same block, even better-- the blocks should be
warm in the LRU cache. The case where this approach might not be as CPU
efficient is if the versions are fairly densely packed together, and a range
scan (#2) might have worked better. But for the case the app should probably
be using setTimeRange() API instead.
single roundtrip
-
column, but with several versions (e.g., each version representing an event
in a log), and we want to be able to extract specific set of versions from
the row in a single round-trip.
using setTimeStamp(ts) or a range of versions using setTimeRange(min, max).
But not a set of specified versions. It would be useful to add this ability.


 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882881#action_12882881
 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
--

With #3, in terms of API, what I had in mind was to add setTimeStamps() to the 
Get object which takes a List of timestamps, and stashes away the list in a 
private (new) field of the Get object. 

On a Get object, the client may apply a setTimeStamp(), setTimeRange(), 
setTimeStamps(), and these correspond to the equivalent notions in SQL

 WHERE time = ts

 WHERE time = ts1 and time  ts2

 WHERE time IN (ts1, ts2, , tsn)

respectively. 

If client calls multiple of these APIs on the same Get object, we could simply 
have a latest wins rule (which is already the case for the existing two API 
calls).





 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882900#action_12882900
 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
--

Ryan: Ok, will explore a Filter based approach. 

 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.