date:20100626

[jira] Created: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

Add ability to extract a specified list of versions of a column in a single 
roundtrip
-

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan


In one of the use cases we were looking at, each row contains a single column, 
but with several versions (e.g., each version representing an event in a log), 
and we want to be able to extract specific set of versions from the row in a 
single round-trip.

Currently, on a Get, one can retrieve a specific version of a column using 
setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a 
set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan reassigned HBASE-2793:


Assignee: Kannan Muthukkaruppan

 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

ROWCOL bloom filter not used if multiple columns within same family are 
requested in a Get
--

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan


Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():

{code}
switch(bloomFilterType) {
  case ROW:
key = row;
break;
  case ROWCOL:
if (columns.size() == 1) {
  byte[] col = columns.first();
  key = Bytes.add(row, col);
  break;
}
//$FALL-THROUGH$
  default:
return true;
}
{code}

If columns.size  1, then we currently don't take advantage of the bloom 
filter.  We should optimize this to check bloom for each of columns and if none 
of the columns are present in the bloom avoid opening the file.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882820#action_12882820
 ] 

Kannan Muthukkaruppan commented on HBASE-2794:
--

Perhaps a simple starter task for someone interested.

 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2010-06-26 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-2794:
---

Tags: noob

 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882854#action_12882854
 ] 

stack commented on HBASE-2793:
--

What you thinking Kannan?  Passing a filter?

 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-50) Snapshot of table

2010-06-26 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882867#action_12882867
]

stack commented on HBASE-50:

Thinking on it, Li, maybe its best if you work up in github and just log here
when you do big pushes to your github repro? That way you are in charge of it
and not dependent on laggard hbase committers getting your work into the branch?

Snapshot of table
-

Key: HBASE-50
URL: https://issues.apache.org/jira/browse/HBASE-50
Project: HBase
Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot
Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class
Diagram.png

Havening an option to take a snapshot of a table would be vary useful in
production.
What I would like to see this option do is do a merge of all the data into
one or more files stored in the same folder on the dfs. This way we could
save data in case of a software bug in hadoop or user code.
The other advantage would be to be able to export a table to multi locations.
Say I had a read_only table that must be online. I could take a snapshot of
it when needed and export it to a separate data center and have it loaded
there and then i would have it online at multi data centers for load
balancing and failover.
I understand that hadoop takes the need out of havening backup to protect
from failed servers, but this does not protect use from software bugs that
might delete or alter data in ways we did not plan. We should have a way we
can roll back a dataset.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

2010-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2707:
-

Attachment: 2707-0.20.txt

Backport to 0.20.  Please review.  I'd like this to go into 0.20 since it hoses 
cluster if it ever happens.

 Can't recover from a dead ROOT server if any exceptions happens during log 
 splitting
 

 Key: HBASE-2707
 URL: https://issues.apache.org/jira/browse/HBASE-2707
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 2707-0.20.txt, 2707-test.txt, HBASE-2707.patch


 There's an almost easy way to get stuck after a RS holding ROOT dies, usually 
 from a GC-like event. It happens frequently to my TestReplication in 
 HBASE-2223.
 Some logs:
 {code}
 2010-06-10 11:35:52,090 INFO  [master] wal.HLog(1175): Spliting is done. 
 Removing old log dir 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 2010-06-10 11:35:52,095 WARN  [master] 
 master.RegionServerOperationQueue(183): Failed processing: 
 ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed 
 todo queue
 java.io.IOException: Cannot delete: 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179)
 at 
 org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298)
 at 
 org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456)
 Caused by: java.io.IOException: java.io.IOException: 
 /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty
 2010-06-10 11:35:52,097 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,098 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,523 INFO  [main.serverMonitor] 
 master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average 
 load 14.0[10.10.1.63,55846,1276194933831]
 2010-06-10 11:35:54,099 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:55,101 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 {code}
 The last lines are my own debug. Since we don't process the delayed todo if 
 ROOT isn't online, we'll never reassign the regions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

2010-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2707:
-

Fix Version/s: 0.20.6

Moving into 0.20.6.

 Can't recover from a dead ROOT server if any exceptions happens during log 
 splitting
 

 Key: HBASE-2707
 URL: https://issues.apache.org/jira/browse/HBASE-2707
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Blocker
 Fix For: 0.20.6, 0.21.0

 Attachments: 2707-0.20.txt, 2707-test.txt, HBASE-2707.patch


 There's an almost easy way to get stuck after a RS holding ROOT dies, usually 
 from a GC-like event. It happens frequently to my TestReplication in 
 HBASE-2223.
 Some logs:
 {code}
 2010-06-10 11:35:52,090 INFO  [master] wal.HLog(1175): Spliting is done. 
 Removing old log dir 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 2010-06-10 11:35:52,095 WARN  [master] 
 master.RegionServerOperationQueue(183): Failed processing: 
 ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed 
 todo queue
 java.io.IOException: Cannot delete: 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179)
 at 
 org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298)
 at 
 org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456)
 Caused by: java.io.IOException: java.io.IOException: 
 /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty
 2010-06-10 11:35:52,097 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,098 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,523 INFO  [main.serverMonitor] 
 master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average 
 load 14.0[10.10.1.63,55846,1276194933831]
 2010-06-10 11:35:54,099 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:55,101 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 {code}
 The last lines are my own debug. Since we don't process the delayed todo if 
 ROOT isn't online, we'll never reassign the regions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HBASE-2795) On review HBASE-2707 has problem in that we'll get stuck in the delay queue and never come out

2010-06-26 Thread stack (JIRA)

On review HBASE-2707 has problem in that we'll get stuck in the delay queue and 
never come out
--

 Key: HBASE-2795
 URL: https://issues.apache.org/jira/browse/HBASE-2795
 Project: HBase
  Issue Type: Bug
Reporter: stack


I committed the hbase-2707 patch yesterday but on second thoughts, it has a 
flaw in that if nothing in the todo queue, we then poll the delayedtodo queue.  
If we fall into the latter and it has not elements, then we'll never come out; 
there are no notifyalls going on to wake us up.  Patch coming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HBASE-2796) Backport of 2707 to 0.20 branch

2010-06-26 Thread stack (JIRA)

Backport of 2707 to 0.20 branch
---

 Key: HBASE-2796
 URL: https://issues.apache.org/jira/browse/HBASE-2796
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.20.6


Backport the hbase-2707 fix to the 0.20 branch.  If 2707 happens, it hoses 
cluster...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

2010-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2707:
-

Fix Version/s: (was: 0.20.6)

Taking out of 0.20.6.  Open separate issue.  The attached 0.20 patch is not 
enough.  The change would be more major than this patch presumes.

 Can't recover from a dead ROOT server if any exceptions happens during log 
 splitting
 

 Key: HBASE-2707
 URL: https://issues.apache.org/jira/browse/HBASE-2707
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 2707-0.20.txt, 2707-test.txt, HBASE-2707.patch


 There's an almost easy way to get stuck after a RS holding ROOT dies, usually 
 from a GC-like event. It happens frequently to my TestReplication in 
 HBASE-2223.
 Some logs:
 {code}
 2010-06-10 11:35:52,090 INFO  [master] wal.HLog(1175): Spliting is done. 
 Removing old log dir 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 2010-06-10 11:35:52,095 WARN  [master] 
 master.RegionServerOperationQueue(183): Failed processing: 
 ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed 
 todo queue
 java.io.IOException: Cannot delete: 
 hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179)
 at 
 org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298)
 at 
 org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456)
 Caused by: java.io.IOException: java.io.IOException: 
 /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty
 2010-06-10 11:35:52,097 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,098 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:53,523 INFO  [main.serverMonitor] 
 master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average 
 load 14.0[10.10.1.63,55846,1276194933831]
 2010-06-10 11:35:54,099 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 2010-06-10 11:35:55,101 DEBUG [master] 
 master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process 
 delayedToDoQueue items
 {code}
 The last lines are my own debug. Since we don't process the delayed todo if 
 ROOT isn't online, we'll never reassign the regions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882878#action_12882878
]

Kannan Muthukkaruppan commented on HBASE-2793:
--

Still need to look at the code some more. Thinking aloud some options seem to
be:

(Note as background: that we are planning to add HBASE-2265. So, it would be
nice if the fix for this issue also takes advantage of that optimization and
avoids a full row scan).

#1) Filter object with a list of versions you are interested in. But it seems
like in this approach, you'll end up doing a full scan-- and check against the
filter for each row. There wouldn't be a way to early exit.

#2) Variant of #1. Additionally compute the min/max version from the passed in
set of versions; use the code setTimeRange() to trim down the set of columns we
look at; and apply the filter against those columns. Still not a great approach
is versions passed are spread out too much.

#3) Do N point lookups (or 1 column scans), one version at a time (all in the
same server roundtrip of course). I think it is still important to preserve
row-level consistency-- i.e. we should do a consistent read of the all the
versions within a row. The stuff Ryan has done should probably make it easy.
But I don't know this too well yet.

#4) Implement Batch Get[] API. The app would need to pass a List of Get
objects, all for the same row, and use setTimeStamp() to set the version
explicitly in each Get object. The trouble though is that the general case of
the Batch Get[] API doesn't have to support a consistency read across all Gets
in a batch; but for this case a consistent read would be the desired semantics.

I think #3 might be best overall. If there are 1 versions of a cell, and
you are interested in version 1 and 1 ones, then point lookups will be as
good as it gets-- and should fetch just the minimal blocks needed. If the
versions happen to be on same block, even better-- the blocks should be warm in
the LRU cache. The case where this approach might not be as CPU efficient is if
the versions are fairly densely packed together, and a range scan (#2) might
have worked better. But for the case the app should probably be using
setTimeRange() API instead.

Add ability to extract a specified list of versions of a column in a single
roundtrip
-

Key: HBASE-2793
URL: https://issues.apache.org/jira/browse/HBASE-2793
Project: HBase
Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

In one of the use cases we were looking at, each row contains a single
column, but with several versions (e.g., each version representing an event
in a log), and we want to be able to extract specific set of versions from
the row in a single round-trip.
Currently, on a Get, one can retrieve a specific version of a column using
setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not
a set of specified versions. It would be useful to add this ability.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882880#action_12882880
]

ryan rawson commented on HBASE-2793:

#2 wontbe so bad... filters are pretty deep and will be just as efficient as
hacking scan query Matcher I think.

On Jun 26, 2010 11:45 AM, Kannan Muthukkaruppan (JIRA) j...@apache.org
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882878#action_12882878]
to be:
be nice if the fix for this issue also takes advantage of that optimization
and avoids a full row scan).
seems like in this approach, you'll end up doing a full scan-- and check
against the filter for each row. There wouldn't be a way to early exit.
passed in set of versions; use the code setTimeRange() to trim down the set
of columns we look at; and apply the filter against those columns. Still not
a great approach is versions passed are spread out too much.
the same server roundtrip of course). I think it is still important to
preserve row-level consistency-- i.e. we should do a consistent read of the
all the versions within a row. The stuff Ryan has done should probably make
it easy. But I don't know this too well yet.
objects, all for the same row, and use setTimeStamp() to set the version
explicitly in each Get object. The trouble though is that the general case
of the Batch Get[] API doesn't have to support a consistency read across all
Gets in a batch; but for this case a consistent read would be the desired
semantics.
and you are interested in version 1 and 1 ones, then point lookups will
be as good as it gets-- and should fetch just the minimal blocks needed. If
the versions happen to be on same block, even better-- the blocks should be
warm in the LRU cache. The case where this approach might not be as CPU
efficient is if the versions are fairly densely packed together, and a range
scan (#2) might have worked better. But for the case the app should probably
be using setTimeRange() API instead.
single roundtrip
-
column, but with several versions (e.g., each version representing an event
in a log), and we want to be able to extract specific set of versions from
the row in a single round-trip.
using setTimeStamp(ts) or a range of versions using setTimeRange(min, max).
But not a set of specified versions. It would be useful to add this ability.

Add ability to extract a specified list of versions of a column in a single
roundtrip
-

Key: HBASE-2793
URL: https://issues.apache.org/jira/browse/HBASE-2793
Project: HBase
Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882881#action_12882881
 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
--

With #3, in terms of API, what I had in mind was to add setTimeStamps() to the 
Get object which takes a List of timestamps, and stashes away the list in a 
private (new) field of the Get object. 

On a Get object, the client may apply a setTimeStamp(), setTimeRange(), 
setTimeStamps(), and these correspond to the equivalent notions in SQL

 WHERE time = ts

 WHERE time = ts1 and time  ts2

 WHERE time IN (ts1, ts2, , tsn)

respectively. 

If client calls multiple of these APIs on the same Get object, we could simply 
have a latest wins rule (which is already the case for the existing two API 
calls).





 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

2010-06-26 Thread Kannan Muthukkaruppan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882900#action_12882900
 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
--

Ryan: Ok, will explore a Filter based approach. 

 Add ability to extract a specified list of versions of a column in a single 
 roundtrip
 -

 Key: HBASE-2793
 URL: https://issues.apache.org/jira/browse/HBASE-2793
 Project: HBase
  Issue Type: New Feature
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 In one of the use cases we were looking at, each row contains a single 
 column, but with several versions (e.g., each version representing an event 
 in a log), and we want to be able to extract specific set of versions from 
 the row in a single round-trip.
 Currently, on a Get, one can retrieve a specific version of a column using 
 setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
 a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

[jira] Assigned: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

[jira] Created: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

[jira] Commented: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

[jira] Updated: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

[jira] Commented: (HBASE-50) Snapshot of table

[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

[jira] Created: (HBASE-2795) On review HBASE-2707 has problem in that we'll get stuck in the delay queue and never come out

[jira] Created: (HBASE-2796) Backport of 2707 to 0.20 branch

[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

16 matches

Site Navigation

Mail list logo

Footer information