[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2012-03-21 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235031#comment-13235031
 ] 

Jonathan Gray commented on HBASE-4410:
--

Not working on this right now, punt it!  Thanks Lars

On Mar 21, 2012, at 3:31 PM, "Lars Hofhansl (Updated) (JIRA)"



> FilterList.filterKeyValue can return suboptimal ReturnCodes
> ---
>
> Key: HBASE-4410
> URL: https://issues.apache.org/jira/browse/HBASE-4410
> Project: HBase
>  Issue Type: Improvement
>  Components: filters
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: HBASE-4410-v1.patch
>
>
> FilterList.filterKeyValue does not always return the most optimal ReturnCode 
> in both the AND and OR conditions.
> For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
> the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
> SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
> ReturnCode from F2.
> For AND conditions, we can always pick the *most restrictive* return code.
> For OR conditions, we must always pick the *least restrictive* return code.
> This JIRA is to review the FilterList.filterKeyValue() method to try and make 
> it more optimal and to add a new unit test which verifies the correct 
> behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3171) Drop ROOT and instead store META location(s) directly in ZooKeeper

2012-02-09 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204346#comment-13204346
 ] 

Jonathan Gray commented on HBASE-3171:
--

Thanks for taking a look.  Need to think about backward compatibility more.  
Might need to hold off until some future big client/server change?

Were you think that meta locations would be in both the META region(s) as well 
as up in ZK?  Or just in ZK?  If it was in both, then should be easier to 
provide backwards compatibility.

Which would be source of truth and which would be relied upon for persistence?  
I suppose all the data in meta is recoverable from the regions themselves (or 
should be) between restarts so we wouldn't have a hard requirement on zk 
persistence between restarts.  Doing the meta edits in zk might help suss out 
some of those trickier race conditions around region movement, splitting, meta 
updating, and crashing.

Was also thinking we should revisit the idea of more intelligent redirecting of 
clients along with NSREs while looking at this stuff.

> Drop ROOT and instead store META location(s) directly in ZooKeeper
> --
>
> Key: HBASE-3171
> URL: https://issues.apache.org/jira/browse/HBASE-3171
> Project: HBase
>  Issue Type: Improvement
>  Components: client, master, regionserver, zookeeper
>Reporter: Jonathan Gray
>
> Rather than storing the ROOT region location in ZooKeeper, going to ROOT, and 
> reading the META location, we should just store the META location directly in 
> ZooKeeper.
> The purpose of the root region from the bigtable paper was to support 
> multiple meta regions.  Currently, we explicitly only support a single meta 
> region, so the translation from our current code of a single root location to 
> a single meta location will be very simple.  Long-term, it seems reasonable 
> that we could store several meta region locations in ZK.  There's been some 
> discussion in HBASE-1755 about actually moving META into ZK, but I think this 
> jira is a good step towards taking some of the complexity out of how we have 
> to deal with catalog tables everywhere.
> As-is, a new client already requires ZK to get the root location, so this 
> would not change those requirements in any way.
> The primary motivation for this is to simplify things like CatalogTracker.  
> The way we can handle root in that class is really simple but the tracking of 
> meta is difficulty and a bit hacky.  This hack on tracking of the meta 
> location is what caused one of the bugs over in HBASE-3159.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2012-02-02 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199312#comment-13199312
 ] 

Jonathan Gray commented on HBASE-4528:
--

@Mubarek, since it's a performance optimization and new feature, it's not going 
to be committed into the 90/92 branches.  That being said, this patch could be 
backported if someone wanted to use it on a 92 branch (90 might be 
significantly more difficult, not sure).

> The put operation can release the rowlock before sync-ing the Hlog
> --
>
> Key: HBASE-4528
> URL: https://issues.apache.org/jira/browse/HBASE-4528
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4528-trunk-v9.txt, 4528-trunk.txt, 
> HBASE-4528-Trunk-FINAL.patch, appendNoSync5.txt, appendNoSyncPut1.txt, 
> appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, 
> appendNoSyncPut5.txt, appendNoSyncPut6.txt, appendNoSyncPut7.txt, 
> appendNoSyncPut8.txt
>
>
> This allows for better throughput when there are hot rows. A single row 
> update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement (MultiGet functionality for increments)

2012-01-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179220#comment-13179220
 ] 

Jonathan Gray commented on HBASE-2947:
--

Not working on it but no reason not to commit that I recall.




> MultiIncrement (MultiGet functionality for increments)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Minor
> Attachments: HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4752) Don't create an unnecessary LinkedList when evicting from the BlockCache

2011-11-07 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145693#comment-13145693
 ] 

Jonathan Gray commented on HBASE-4752:
--

Sorry I didn't chime in earlier, been traveling.

I'm actually -1 on this change at the moment because of the introduction of a 
Google class (now the block cache has this external dependency).  This class is 
actually used by other projects outside of HBase, so I'd hate to put in an 
unnecessary dependency.  Is there additional value we get out of using the 
MinMaxPQ?  We save a LinkedList allocation?

As for the change in behavior, I'm not sure I follow.  Seems like nothing 
actually changes?  (whether the PQ is cleared or not doesn't really matter, 
behavior-wise?)

The way I'm reading the code, it seems like we could actually just remove the 
LL completely and leave in place the regular PQ?  CachedBlock takes care of the 
sort order, no?

> Don't create an unnecessary LinkedList when evicting from the BlockCache
> 
>
> Key: HBASE-4752
> URL: https://issues.apache.org/jira/browse/HBASE-4752
> Project: HBase
>  Issue Type: Improvement
>  Components: performance, regionserver
>Affects Versions: 0.90.4
>Reporter: Benoit Sigoure
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 
> 0001-HBASE-4752-Don-t-create-an-unnecessary-LinkedList-wh.patch, 
> 4752-trunk-v2.txt, 4752-trunk.txt
>
>
> When evicting from the BlockCache, the code creates a LinkedList containing 
> every single block sorted by access time.  This list is created from a 
> PriorityQueue.  I don't believe it is necessary, as the PriorityQueue can be 
> used directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4745) LRU Statistics thread should be daemon

2011-11-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144495#comment-13144495
 ] 

Jonathan Gray commented on HBASE-4745:
--

+1

> LRU Statistics thread should be daemon
> --
>
> Key: HBASE-4745
> URL: https://issues.apache.org/jira/browse/HBASE-4745
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Andrew Purtell
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4745.patch
>
>
> Here was from 'HBase 0.92/Hadoop 0.22 test results' discussion on dev@hbase
> {code}
> "LRU Statistics #0" prio=10 tid=0x7f4edc7dd800 nid=0x211a waiting
> on condition [0x7f4e631e2000]
>   java.lang.Thread.State: TIMED_WAITING (parking)
>at sun.misc.Unsafe.park(Native Method)
>- parking to wait for  <0x7f4e88acc968> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
>at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
>at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>at java.lang.Thread.run(Thread.java:619)
> {code}
> We should make this thread daemon thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-11-01 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141478#comment-13141478
 ] 

Jonathan Gray commented on HBASE-4298:
--

I think this should be for 0.94 since it's a new feature.  I also think a 
pre-requisite to commit is a unit test.

> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
> Attachments: 4298-trunk-v2.txt, 90_hbase.patch, trunk_hbase.patch
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction

2011-11-01 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141445#comment-13141445
 ] 

Jonathan Gray commented on HBASE-4717:
--

+1 on this general direction.

We've long talked of special compaction heuristics that would bucketize by time 
in some way (and you could really take advantage of the TimeRangeTracker file 
selection stuff for read perf).  We did as you describe and set a small 
max.size, so once a file reached a certain size, it would never be compacted 
again.  This allowed us to "age out" the data by keeping old stuff separate 
from new stuff in files.

We were not trying to actually wipe out the data, only separate it, because 
this was mostly a read-modify-write workload that needed access to recent data 
but the old data still needed to be available for user read queries.  It would 
probably be simple to add a check during compaction time of the time range of 
each file and if the max is expired, just to wipe out that file.

> More efficient age-off of old data during major compaction
> --
>
> Key: HBASE-4717
> URL: https://issues.apache.org/jira/browse/HBASE-4717
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Todd Lipcon
>
> Many applications need to implement efficient age-off of old data. We 
> currently only perform age-off during major compaction by scanning through 
> all of the KVs. Instead, we could implement the following:
> - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store 
> files contain only smaller finite ranges of time.
> - Periodically run an "age-off compaction". This compaction would scan the 
> current list of storefiles. Any store file that falls entirely out of the TTL 
> time range would be dropped. Store files completely within the time range 
> would be un-altered. Those crossing the time-range boundary could either be 
> left alone or compacted using the existing compaction code.
> I don't have a design in mind for how exactly this would be implemented, but 
> hope to generate some discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138831#comment-13138831
 ] 

Jonathan Gray commented on HBASE-4532:
--

I don't think JIRA being open/closed is the issue, it's more multiple commits.

But yeah, as a separate note, looks like there was no final comment and 
resolution after the commit.

> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
> hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if 
> row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a 
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty 
> column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last 
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are 
> trying to GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
> enabled.[HBASE-4469]
> 
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4687) regionserver may miss zk-heartbeats to master when replaying edits at region open

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138828#comment-13138828
 ] 

Jonathan Gray commented on HBASE-4687:
--

Thanks Prakash!

> regionserver may miss zk-heartbeats to master when replaying edits at region 
> open
> -
>
> Key: HBASE-4687
> URL: https://issues.apache.org/jira/browse/HBASE-4687
> Project: HBase
>  Issue Type: Bug
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4687-regionserver-may-miss-zk-heartbeats-to-ma.patch
>
>
> replayRecoveredEdits() should do another reporter.progress() before returning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138822#comment-13138822
 ] 

Jonathan Gray commented on HBASE-4532:
--

Please stop doing multiple commits on the same JIRA! :)  I thought we agreed on 
this, or no?

> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
> hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if 
> row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a 
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty 
> column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last 
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are 
> trying to GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
> enabled.[HBASE-4469]
> 
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138809#comment-13138809
 ] 

Jonathan Gray commented on HBASE-4641:
--

Opened HBASE-4697 to deal with "real" solution.

> Block cache can be mistakenly instantiated on Master
> 
>
> Key: HBASE-4641
> URL: https://issues.apache.org/jira/browse/HBASE-4641
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4641-suggestion-v3.txt, 4641-v4.txt, 
> HBASE-4641-v1.patch, HBASE-4641-v2.patch
>
>
> After changes in the block cache instantiation over in HBASE-4422, it looks 
> like the HMaster can now end up with a block cache instantiated.  Not a huge 
> deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138792#comment-13138792
 ] 

Jonathan Gray commented on HBASE-4641:
--

I like v4 much less than the other changes.  My v1 patch makes it so we could 
potentially break something because it's expecting to be able to manipulate the 
conf after construction (an easy assumption to document / test for).  The v4 
patch now takes the conf passed in by reference and modifies it.  It then 
modifies the same conf reference later in Store.  Seems like this could have 
some bad side-effects in the opposite direction.

At this point, I vote for the v1 hack until we make the cache non-static.  As 
long as unit tests still pass.

> Block cache can be mistakenly instantiated on Master
> 
>
> Key: HBASE-4641
> URL: https://issues.apache.org/jira/browse/HBASE-4641
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Gray
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4641-suggestion-v3.txt, 4641-v4.txt, 
> HBASE-4641-v1.patch, HBASE-4641-v2.patch
>
>
> After changes in the block cache instantiation over in HBASE-4422, it looks 
> like the HMaster can now end up with a block cache instantiated.  Not a huge 
> deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138740#comment-13138740
 ] 

Jonathan Gray commented on HBASE-1744:
--

One more requested change.  Over in HBASE-4658 the map of attributes was added 
to the available APIs in thrift.  Could we add this to the new TScan, TGet, 
etc. structs?

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Tim Sell
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 
> 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 
> HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
> HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
> HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
> thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138061#comment-13138061
 ] 

Jonathan Gray commented on HBASE-4641:
--

The real fix is for the block cache to be instantiated in HRS and not be static.

This slightly complicates things but is possible.




> Block cache can be mistakenly instantiated on Master
> 
>
> Key: HBASE-4641
> URL: https://issues.apache.org/jira/browse/HBASE-4641
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4641-suggestion-v3.txt, HBASE-4641-v1.patch, 
> HBASE-4641-v2.patch
>
>
> After changes in the block cache instantiation over in HBASE-4422, it looks 
> like the HMaster can now end up with a block cache instantiated.  Not a huge 
> deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138053#comment-13138053
 ] 

Jonathan Gray commented on HBASE-4641:
--

Stack, that's what I had in v1.  I felt like it was an ugly hack and might have 
an impact on unit tests that modify a conf after hmaster is instantiated.

I can just try that again and run the unit tests to see if they do all pass.




> Block cache can be mistakenly instantiated on Master
> 
>
> Key: HBASE-4641
> URL: https://issues.apache.org/jira/browse/HBASE-4641
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-4641-v1.patch, HBASE-4641-v2.patch
>
>
> After changes in the block cache instantiation over in HBASE-4422, it looks 
> like the HMaster can now end up with a block cache instantiated.  Not a huge 
> deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137719#comment-13137719
 ] 

Jonathan Gray commented on HBASE-4528:
--

Is it safe to ignore this close?  Should it be a WARN not DEBUG?  I'm a little 
confused why this is happening in the test.  Is the FS being closed before this 
finishes or what?

> The put operation can release the rowlock before sync-ing the Hlog
> --
>
> Key: HBASE-4528
> URL: https://issues.apache.org/jira/browse/HBASE-4528
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4528-trunk.txt, HBASE-4528-Trunk-FINAL.patch, 
> appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
> appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt, 
> appendNoSyncPut6.txt, appendNoSyncPut7.txt, appendNoSyncPut8.txt
>
>
> This allows for better throughput when there are hot rows. A single row 
> update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137570#comment-13137570
 ] 

Jonathan Gray commented on HBASE-4528:
--

+1 on adding the log line Ted.  Will do.

 I will try to spend time looking at the unit test tonight.




> The put operation can release the rowlock before sync-ing the Hlog
> --
>
> Key: HBASE-4528
> URL: https://issues.apache.org/jira/browse/HBASE-4528
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: HBASE-4528-Trunk-FINAL.patch, appendNoSync5.txt, 
> appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, 
> appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, 
> appendNoSyncPut7.txt, appendNoSyncPut8.txt
>
>
> This allows for better throughput when there are hot rows. A single row 
> update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137291#comment-13137291
 ] 

Jonathan Gray commented on HBASE-4658:
--

Dhruba, it is thrift 0.7.0 (or at least it was last time I generated).  If you 
don't have time today I can regenerate Hbase.java and commit this.

Re: HBASE-1744, will this change apply after that goes in?  It seems like this 
change could be added on top of that change but that your current patch is 
based on the current thrift API?

> Put attributes are not exposed via the ThriftServer
> ---
>
> Key: HBASE-4658
> URL: https://issues.apache.org/jira/browse/HBASE-4658
> Project: HBase
>  Issue Type: Bug
>  Components: thrift
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: ThriftPutAttributes1.txt
>
>
> The Put api also takes in a bunch of arbitrary attributes that an application 
> can use to associate metadata with each put operation. This is not exposed 
> via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4683) Create config option to only cache index blocks

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137285#comment-13137285
 ] 

Jonathan Gray commented on HBASE-4683:
--

+1 on both ideas, Lars.

> Create config option to only cache index blocks
> ---
>
> Key: HBASE-4683
> URL: https://issues.apache.org/jira/browse/HBASE-4683
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
>
> This would add a new boolean config option: hfile.block.cache.datablocks
> Default would be true.
> Setting this to false allows HBase in a mode where only index blocks are 
> cached, which is useful for analytical scenarios where a useful working set 
> of the data cannot be expected to fit into the (aggregate) cache.
> This is the equivalent of setting cacheBlocks to false on all scans 
> (including scans on behalf of gets).
> I would like to get a general feeling about what folks think about this.
> The change itself would be simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136412#comment-13136412
 ] 

Jonathan Gray commented on HBASE-4528:
--

Sorry Ted, I'm not clear on what exactly you're pointing out.  Is something 
broken there?

> The put operation can release the rowlock before sync-ing the Hlog
> --
>
> Key: HBASE-4528
> URL: https://issues.apache.org/jira/browse/HBASE-4528
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: HBASE-4528-Trunk-FINAL.patch, appendNoSync5.txt, 
> appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, 
> appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, 
> appendNoSyncPut7.txt, appendNoSyncPut8.txt
>
>
> This allows for better throughput when there are hot rows. A single row 
> update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2011-10-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136383#comment-13136383
 ] 

Jonathan Gray commented on HBASE-4658:
--

How does this relate to HBASE-1744?  That's slated for 0.94, should we just put 
this in 0.92?  And I guess we should ensure that attributes are supported over 
there.

I'm +1 on putting this in 0.92 since it makes it possible to add whatever we 
want without changing the API in 92 minor releases.

> Put attributes are not exposed via the ThriftServer
> ---
>
> Key: HBASE-4658
> URL: https://issues.apache.org/jira/browse/HBASE-4658
> Project: HBase
>  Issue Type: Bug
>  Components: thrift
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: ThriftPutAttributes1.txt
>
>
> The Put api also takes in a bunch of arbitrary attributes that an application 
> can use to associate metadata with each put operation. This is not exposed 
> via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4447) Allow hbase.version to be passed in as command-line argument

2011-10-23 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133780#comment-13133780
 ] 

Jonathan Gray commented on HBASE-4447:
--

Shouldn't be fixed?  Should be Invalid or some other?

Thanks for all the cleanup stack!

> Allow hbase.version to be passed in as command-line argument
> 
>
> Key: HBASE-4447
> URL: https://issues.apache.org/jira/browse/HBASE-4447
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.92.0
>Reporter: Joep Rottinghuis
>Assignee: Joep Rottinghuis
> Fix For: 0.92.0
>
> Attachments: HBASE-4447-0.92.patch
>
>
> Currently the build always produces the jars and tarball according to the 
> version baked into the POM.
> When we modify this to allow the version to be passed in as a command-line 
> argument, it can still default to the same behavior, yet give the flexibility 
> for an internal build to tag on own version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4643) Consider reverting HBASE-451 (change HRI to remove HTD) in 0.92

2011-10-21 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132865#comment-13132865
 ] 

Jonathan Gray commented on HBASE-4643:
--

I've had a few pretty horrible experiences moving an 0.90 cluster to 0.92 so 
far, so I agree that this is definitely the most unbaked part of 0.92.

Now that I've got 92 clusters, I'm going to have to figure out a reverting plan 
for them if we back this out now.  It will also become a barrier between 0.92 
and 0.94 which will make my life difficult as well (since we have been pulling 
94 changes into a local 92 branch).

I'd like to see if Stack's next changes do the trick before abandoning this.

> Consider reverting HBASE-451 (change HRI to remove HTD) in 0.92
> ---
>
> Key: HBASE-4643
> URL: https://issues.apache.org/jira/browse/HBASE-4643
> Project: HBase
>  Issue Type: Brainstorming
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
> Attachments: revert.txt
>
>
> I've been chatting with some folks recently about this thought: it seems 
> like, if you enumerate the larger changes in 0.92, this is probably the one 
> that is the most destabilizing that hasn't been through a lot of "baking" 
> yet. You can see this evidenced by the very high number of followup commits 
> it generated: looks like somewhere around 15 of them, plus some bugs still 
> open.
> I've done a patch to revert this and the related followup changes on the 0.92 
> branch. Do we want to consider doing this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-21 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132854#comment-13132854
 ] 

Jonathan Gray commented on HBASE-4641:
--

Thanks Ted.  You see any others?

> Block cache can be mistakenly instantiated on Master
> 
>
> Key: HBASE-4641
> URL: https://issues.apache.org/jira/browse/HBASE-4641
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-4641-v1.patch, HBASE-4641-v2.patch
>
>
> After changes in the block cache instantiation over in HBASE-4422, it looks 
> like the HMaster can now end up with a block cache instantiated.  Not a huge 
> deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4636) Refactor catalog MetaReader and MetaEditor so one class only

2011-10-20 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131952#comment-13131952
 ] 

Jonathan Gray commented on HBASE-4636:
--

is this really a noob task?  i'm +1 on revisiting the structure here, but 
shouldn't it be part of the larger CatalogTracker / retry facilities / etc?

> Refactor catalog MetaReader and MetaEditor so one class only
> 
>
> Key: HBASE-4636
> URL: https://issues.apache.org/jira/browse/HBASE-4636
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>  Labels: noob
>
> I suggest we collapse MetaReader and MetaEditor.  Setters are in one class 
> while Getters are in another which is a little disorientating.  The Editor 
> class uses facility from the Reader class to do edits which seems a little 
> off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-10-20 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131898#comment-13131898
 ] 

Jonathan Gray commented on HBASE-4608:
--

I think the idea is a custom compression where we can do stuff like start the 
HLog with a dictionary of some known repetitive stuff.  It's very similar to 
the delta encoding work.




> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131403#comment-13131403
 ] 

Jonathan Gray commented on HBASE-4536:
--

+1 to v16 for commit to trunk.  You are a good man, Lars.  Well done.  And 
thanks for being patient.

> Allow CF to retain deleted rows
> ---
>
> Key: HBASE-4536
> URL: https://issues.apache.org/jira/browse/HBASE-4536
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 4536-v15.txt, 4536-v16.txt
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
> of versions.
> However, if a client deletes a row all version older than the delete tomb 
> stone will be remove at the next major compaction (and even at memstore flush 
> - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
> versions even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
> (from a user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4626) Filters unnecessarily copy byte arrays...

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131378#comment-13131378
 ] 

Jonathan Gray commented on HBASE-4626:
--

I'm okay with this in 92 but would prefer it goes to 94.  Put the perf in the 
next release so we release it soon.

> Filters unnecessarily copy byte arrays...
> -
>
> Key: HBASE-4626
> URL: https://issues.apache.org/jira/browse/HBASE-4626
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4626-v2.txt, 4626.txt
>
>
> Just looked at SingleCol and ValueFilter... And on every column compared they 
> create a copy of the column and/or value portion of the KV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131264#comment-13131264
 ] 

Jonathan Gray commented on HBASE-1183:
--

To clarify, I meant that the code seems like you don't need to prepend the 
{1,0} but I have some vague memory of needing it.

> New MR splitting algorithm and other new features need a way to split a key 
> range in N chunks
> -
>
> Key: HBASE-1183
> URL: https://issues.apache.org/jira/browse/HBASE-1183
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Minor
> Fix For: 0.20.0
>
> Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, 
> hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to 
> take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide 
> each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131261#comment-13131261
 ] 

Jonathan Gray commented on HBASE-1183:
--

Wow, really got me thinking back.  I honestly don't remember exactly why.

We convert them to BigInteger so we can do:  (stop - start) / numsplits = 
interval

Something related to signed/unsigned?  Reading the code it does seem okay.  
Good thing I didn't write a unit test.

Are you seeing that it's broken in some way?  I can spend a little more time 
looking at it if necessary.

> New MR splitting algorithm and other new features need a way to split a key 
> range in N chunks
> -
>
> Key: HBASE-1183
> URL: https://issues.apache.org/jira/browse/HBASE-1183
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Minor
> Fix For: 0.20.0
>
> Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, 
> hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to 
> take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide 
> each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4630) If you shutdown all RS an active master is never able to recover when RS come back online

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131210#comment-13131210
 ] 

Jonathan Gray commented on HBASE-4630:
--

The stuff I'm seeing in the logs is different but it's probably the same or a 
related issue.  I'm going to try and dig on this and will figure out whether to 
close this as a dupe or not.  Thanks for the pointer, Ted.

> If you shutdown all RS an active master is never able to recover when RS come 
> back online
> -
>
> Key: HBASE-4630
> URL: https://issues.apache.org/jira/browse/HBASE-4630
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Gray
> Fix For: 0.92.1
>
>
> I've been doing some isolated benchmarking of a single RS and can repeatedly 
> trigger some craziness in the master if I shutdown the RS.  It is never able 
> to recover after bringing RSs back online.  I seem to see different behavior 
> across different branches / revisions of the 92 branch, but there does seem 
> to be an issue in several of them.
> Putting against 0.92.1 so we don't hold up the release of 0.92.  Should not 
> be a blocker.
> Working on a unit test now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131188#comment-13131188
 ] 

Jonathan Gray commented on HBASE-4536:
--

I'm at +0.5

Add just a bit more high-level, config-level doc somewhere and I'm a strong 
+1...

:)

> Allow CF to retain deleted rows
> ---
>
> Key: HBASE-4536
> URL: https://issues.apache.org/jira/browse/HBASE-4536
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 4536-v15.txt
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
> of versions.
> However, if a client deletes a row all version older than the delete tomb 
> stone will be remove at the next major compaction (and even at memstore flush 
> - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
> versions even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
> (from a user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4620) I broke the build when I submitted HBASE-3581 (Send length of the rpc response)

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130833#comment-13130833
 ] 

Jonathan Gray commented on HBASE-4620:
--

or this is meant to combine the two so the | is actually the right behavior for 
'and'?  hmm

> I broke the build when I submitted HBASE-3581 (Send length of the rpc 
> response)
> ---
>
> Key: HBASE-4620
> URL: https://issues.apache.org/jira/browse/HBASE-4620
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 4620.txt
>
>
> Thanks to Ted, Ram and Gao for figuring my messup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4620) I broke the build when I submitted HBASE-3581 (Send length of the rpc response)

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130832#comment-13130832
 ] 

Jonathan Gray commented on HBASE-4620:
--

stack, doesn't the method name imply the existing behavior?  should change the 
method name?

> I broke the build when I submitted HBASE-3581 (Send length of the rpc 
> response)
> ---
>
> Key: HBASE-4620
> URL: https://issues.apache.org/jira/browse/HBASE-4620
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 4620.txt
>
>
> Thanks to Ted, Ram and Gao for figuring my messup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3581) hbase rpc should send size of response

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130416#comment-13130416
 ] 

Jonathan Gray commented on HBASE-3581:
--

should rename method to getErrorOrLengthSet()?

> hbase rpc should send size of response
> --
>
> Key: HBASE-3581
> URL: https://issues.apache.org/jira/browse/HBASE-3581
> Project: HBase
>  Issue Type: Improvement
>Reporter: ryan rawson
>Assignee: stack
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3581-v2.txt, 3581-v3.txt, 3581-v4.txt, 
> HBASE-rpc-response.txt
>
>
> The RPC reply from Server->Client does not include the size of the payload, 
> it is framed like so:
>  callId
>  errorFlag
>  data
> The data segment would contain enough info about how big the response is so 
> that it could be decoded by a writable reader.
> This makes it difficult to write buffering clients, who might read the entire 
> 'data' then pass it to a decoder. While less memory efficient, if you want to 
> easily write block read clients (eg: nio) it would be necessary to send the 
> size along so that the client could snarf into a local buf.
> The new proposal is:
>  callId
>  size
>  errorFlag
>  data
> the size being sizeof(data) + sizeof(errorFlag).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130139#comment-13130139
 ] 

Jonathan Gray commented on HBASE-4611:
--

@Marek, we could file an INFRA task.  Or we could create a new account?  Also, 
there seems to be something with URL translation (JIRA is treating the  tag 
as escaped so actually showing it, and then converting straight text URLs to 
hyperlinks).

> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
> Attachments: D21.1.patch, D21.1.patch
>
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130140#comment-13130140
 ] 

Jonathan Gray commented on HBASE-4611:
--

Looks like you're one step ahead on the  tags, thanks!

> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
> Attachments: D21.1.patch, D21.1.patch
>
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130121#comment-13130121
 ] 

Jonathan Gray commented on HBASE-4611:
--

Should we change the display name of the reviews.facebook.net account from John 
Sichi? :)

> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
> Attachments: D21.1.patch, D21.1.patch
>
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4612) Allow ColumnPrefixFilter to support multiple prefixes

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129955#comment-13129955
 ] 

Jonathan Gray commented on HBASE-4612:
--

Hey Eran.  Thanks for the contribution!  A few comments..

- There's no explanation of the behavior anywhere.  In the constructors and 
addPrefix() methods, you should document that this creates an OR condition 
across all of the prefixes, correct?
- No need to instantiate a new comparator all the time (use 
Bytes.BYTES_COMPARATOR)
- Something seems odd when you keep adding to the end of a List and then sort.  
How about a TreeSet?  You can easily ignore dupes that way.
- There's no input verification so, for example, you could pass a null to the 
constructor or an empty byte[][] and have some strange behavior.  Like it will 
instantiate okay but then you'll get server-side NPEs or IOOB.
- this.prefixes.size() == 0 -> this.prefixes.isEmpty()
- your comment at the top of filterColumn, i wouldn't exactly call it a 
workaround, but it's a good comment.  looking at the logic, it seems like 
correct behavior would be that it can be called with current == size() but it 
would be a bug if current > size(), right?  should you add an assert or throw 
an exception?

> Allow ColumnPrefixFilter to support multiple prefixes
> -
>
> Key: HBASE-4612
> URL: https://issues.apache.org/jira/browse/HBASE-4612
> Project: HBase
>  Issue Type: Improvement
>  Components: filters
>Affects Versions: 0.90.4
>Reporter: Eran Kutner
>Priority: Minor
> Attachments: HBASE-4612-0.90.patch
>
>
> When having a lot of columns grouped by name I've found that it would be very 
> useful to be able to scan them using multiple prefixes, allowing to fetch 
> specific groups in one scan, without fetching the entire row. This is 
> impossible to achieve using a FilterList, so I've added such support to the 
> existing ColmnPrefixFilter while keeping backward compatibility.
> The attached patch is based on 0.90.4, I noticed that the 0.92 branch has a 
> new method to support instantiating filters using Thrift. I'm not sure how 
> the serialization works there so I didn't implement that, but the rest of my 
> code should work in 0.92 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129397#comment-13129397
 ] 

Jonathan Gray commented on HBASE-4611:
--

In addition to being a (better) code review tool, the Phabricator suite also 
includes stuff like repo/revision browsing, nice command-line tools, pastebin, 
etc. which should be available for the HBase repos.

> Add support for Phabricator/Differential as an alternative code review tool
> ---
>
> Key: HBASE-4611
> URL: https://issues.apache.org/jira/browse/HBASE-4611
> Project: HBase
>  Issue Type: Task
>Reporter: Jonathan Gray
>
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3380) Master failover can split logs of live servers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129344#comment-13129344
 ] 

Jonathan Gray commented on HBASE-3380:
--

Heartbeats still exist so I'm not sure much is different in 92 since we tackled 
this, right?

I will open a new JIRA though.

> Master failover can split logs of live servers
> --
>
> Key: HBASE-3380
> URL: https://issues.apache.org/jira/browse/HBASE-3380
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
>Assignee: Jonathan Gray
>Priority: Blocker
> Fix For: 0.90.0
>
> Attachments: HBASE-3380-v1.patch, HBASE-3380-v2.patch
>
>
> The reason why TestMasterFailover fails is that when it does the master 
> failover, the new master doesn't wait long enough for all region servers to 
> checkin so it goes ahead and split logs... which doesn't work because of the 
> way lease timeouts work:
> {noformat}
> 2010-12-21 07:30:36,977 DEBUG [Master:0;vesta.apache.org:33170] 
> wal.HLogSplitter(256): Splitting hlog 1 of 1:
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204,
>  length=0
> 2010-12-21 07:30:36,977 DEBUG [WriterThread-1] 
> wal.HLogSplitter$WriterThread(619): Writer thread 
> Thread[WriterThread-1,5,main]: starting
> 2010-12-21 07:30:36,977 DEBUG [WriterThread-2] 
> wal.HLogSplitter$WriterThread(619): Writer thread 
> Thread[WriterThread-2,5,main]: starting
> 2010-12-21 07:30:36,977 INFO  [Master:0;vesta.apache.org:33170] 
> util.FSUtils(625): Recovering file
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
> 2010-12-21 07:30:36,979 WARN  [IPC Server handler 8 on 49187] 
> namenode.FSNamesystem(1122): DIR* NameSystem.startFile:
>  failed to create file 
> /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
>  for
>  DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
> because this file is already being created by
>  DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
> 127.0.0.1
> ...
> 2010-12-21 07:33:44,332 WARN  [Master:0;vesta.apache.org:33170] 
> util.FSUtils(644): Waited 187354ms for lease recovery on
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204:
>  org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file
>  
> /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
>  for DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
> because this file is already
>  being created by 
> DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
> 127.0.0.1
> {noformat}
> I think that we should always check in ZK the number of live region servers 
> before waiting for them to check in, this way we know how many we should 
> expect during failover. There's also a case where we still want to timeout, 
> since RS can die during that time, but we should wait a bit longer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3380) Master failover can split logs of live servers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129339#comment-13129339
 ] 

Jonathan Gray commented on HBASE-3380:
--

What's the best practice here?  Should I just commit this to 92 and trunk and 
make a note here?  Should I open a new jira since this is so old?

(Thanks for input guys)

> Master failover can split logs of live servers
> --
>
> Key: HBASE-3380
> URL: https://issues.apache.org/jira/browse/HBASE-3380
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
>Assignee: Jonathan Gray
>Priority: Blocker
> Fix For: 0.90.0
>
> Attachments: HBASE-3380-v1.patch, HBASE-3380-v2.patch
>
>
> The reason why TestMasterFailover fails is that when it does the master 
> failover, the new master doesn't wait long enough for all region servers to 
> checkin so it goes ahead and split logs... which doesn't work because of the 
> way lease timeouts work:
> {noformat}
> 2010-12-21 07:30:36,977 DEBUG [Master:0;vesta.apache.org:33170] 
> wal.HLogSplitter(256): Splitting hlog 1 of 1:
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204,
>  length=0
> 2010-12-21 07:30:36,977 DEBUG [WriterThread-1] 
> wal.HLogSplitter$WriterThread(619): Writer thread 
> Thread[WriterThread-1,5,main]: starting
> 2010-12-21 07:30:36,977 DEBUG [WriterThread-2] 
> wal.HLogSplitter$WriterThread(619): Writer thread 
> Thread[WriterThread-2,5,main]: starting
> 2010-12-21 07:30:36,977 INFO  [Master:0;vesta.apache.org:33170] 
> util.FSUtils(625): Recovering file
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
> 2010-12-21 07:30:36,979 WARN  [IPC Server handler 8 on 49187] 
> namenode.FSNamesystem(1122): DIR* NameSystem.startFile:
>  failed to create file 
> /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
>  for
>  DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
> because this file is already being created by
>  DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
> 127.0.0.1
> ...
> 2010-12-21 07:33:44,332 WARN  [Master:0;vesta.apache.org:33170] 
> util.FSUtils(644): Waited 187354ms for lease recovery on
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204:
>  org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file
>  
> /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
>  for DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
> because this file is already
>  being created by 
> DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
> 127.0.0.1
> {noformat}
> I think that we should always check in ZK the number of live region servers 
> before waiting for them to check in, this way we know how many we should 
> expect during failover. There's also a case where we still want to timeout, 
> since RS can die during that time, but we should wait a bit longer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3380) Master failover can split logs of live servers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129293#comment-13129293
 ] 

Jonathan Gray commented on HBASE-3380:
--

So it looks like we thought we'd do a proper fix for 0.92, but do we have one?  
There's some good config params that were committed as part of this JIRA into 
0.90 that are now not available in 0.92.

Should this be committed to 0.92 and trunk?  I'd like to at least bring these 
config params over since they are pretty nice (and will make a more elegant 
solution to stuff like HBASE-4603).

> Master failover can split logs of live servers
> --
>
> Key: HBASE-3380
> URL: https://issues.apache.org/jira/browse/HBASE-3380
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
>Assignee: Jonathan Gray
>Priority: Blocker
> Fix For: 0.90.0
>
> Attachments: HBASE-3380-v1.patch, HBASE-3380-v2.patch
>
>
> The reason why TestMasterFailover fails is that when it does the master 
> failover, the new master doesn't wait long enough for all region servers to 
> checkin so it goes ahead and split logs... which doesn't work because of the 
> way lease timeouts work:
> {noformat}
> 2010-12-21 07:30:36,977 DEBUG [Master:0;vesta.apache.org:33170] 
> wal.HLogSplitter(256): Splitting hlog 1 of 1:
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204,
>  length=0
> 2010-12-21 07:30:36,977 DEBUG [WriterThread-1] 
> wal.HLogSplitter$WriterThread(619): Writer thread 
> Thread[WriterThread-1,5,main]: starting
> 2010-12-21 07:30:36,977 DEBUG [WriterThread-2] 
> wal.HLogSplitter$WriterThread(619): Writer thread 
> Thread[WriterThread-2,5,main]: starting
> 2010-12-21 07:30:36,977 INFO  [Master:0;vesta.apache.org:33170] 
> util.FSUtils(625): Recovering file
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
> 2010-12-21 07:30:36,979 WARN  [IPC Server handler 8 on 49187] 
> namenode.FSNamesystem(1122): DIR* NameSystem.startFile:
>  failed to create file 
> /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
>  for
>  DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
> because this file is already being created by
>  DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
> 127.0.0.1
> ...
> 2010-12-21 07:33:44,332 WARN  [Master:0;vesta.apache.org:33170] 
> util.FSUtils(644): Waited 187354ms for lease recovery on
>  
> hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204:
>  org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file
>  
> /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
>  for DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
> because this file is already
>  being created by 
> DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
> 127.0.0.1
> {noformat}
> I think that we should always check in ZK the number of live region servers 
> before waiting for them to check in, this way we know how many we should 
> expect during failover. There's also a case where we still want to timeout, 
> since RS can die during that time, but we should wait a bit longer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4603) Uneeded sleep time for tests in hbase.master.ServerManager#waitForRegionServers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129295#comment-13129295
 ] 

Jonathan Gray commented on HBASE-4603:
--

There was a nice param in HBASE-3380 that is in 90 but not 92/trunk.  I'm going 
to see if we can get that brought into the active branches, then we can just 
set the maxServers config to the # of RS set to start, and then it will just 
work instantly w/o having to wait for this interval/sleep loop.

> Uneeded sleep time for tests in 
> hbase.master.ServerManager#waitForRegionServers
> ---
>
> Key: HBASE-4603
> URL: https://issues.apache.org/jira/browse/HBASE-4603
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.92.0
> Environment: all.
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 20111017_4603_MiniHBaseCluster.patch
>
>
> This functions waits for at least 2 times 
> hbase.master.wait.on.regionservers.interval, defaulted at 3 seconds, i.e. 6 
> seconds for every mini hbase cluster starts.
> In the context of a mini cluster, it's not useful, as the regions servers are 
> created locally.
> Changing this to a lower value such as 100ms gives 5.8 second per HBase 
> cluser start. It should lower the build time on the apache server by more 
> than 8%.
> Beeing more aggressive (removing all the wait time) could be possible as 
> well. To be studied later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-15 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128317#comment-13128317
 ] 

Jonathan Gray commented on HBASE-4536:
--

bq. I think this new feature should not be the default behavior.

+1

> Allow CF to retain deleted rows
> ---
>
> Key: HBASE-4536
> URL: https://issues.apache.org/jira/browse/HBASE-4536
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
> of versions.
> However, if a client deletes a row all version older than the delete tomb 
> stone will be remove at the next major compaction (and even at memstore flush 
> - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
> versions even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
> (from a user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4593) Design and document the official procedure for posting patches, commits, commit messages, etc. to smooth process and make integration with tools easier

2011-10-14 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128012#comment-13128012
 ] 

Jonathan Gray commented on HBASE-4593:
--

BTW, once we nail down the formatting and everything, I will toss reposync up 
on a github repo or something.

> Design and document the official procedure for posting patches, commits, 
> commit messages, etc. to smooth process and make integration with tools easier
> ---
>
> Key: HBASE-4593
> URL: https://issues.apache.org/jira/browse/HBASE-4593
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Jonathan Gray
>
> I have been building a tool (currently called reposync) to help me keep the 
> internal FB hbase-92-based branch up-to-date with the public branches.
> Various inconsistencies in our process has made it difficult to automate a 
> lot of this stuff.
> I'd like to work with everyone to come up with the official best practices 
> and stick to it.
> I welcome all suggestions.  Among some of the things I'd like to nail down:
> - Commit message format
> - Best practice and commit message format for multiple commits
> - Multiple commits per jira vs. jira per commit, what are the exceptions and 
> when
> - Affects vs. Fix versions
> - Potential usage of [tags] in commit messages for things like book, scripts, 
> shell... maybe even whatever is in the components field?
> - Increased usage of JIRA tags or labels to mark exactly which repos a JIRA 
> has been committed to (potentially even internal repos?  ways for a tool to 
> keep track in JIRA?)
> We also need to be more strict about some things if we want to follow Apache 
> guidelines.  For example, all final versions of a patch must be attached to 
> JIRA so that the author properly assigns it to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4591) TTL for old HLogs should be calculated from last modification time.

2011-10-14 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127859#comment-13127859
 ] 

Jonathan Gray commented on HBASE-4591:
--

+1

> TTL for old HLogs should be calculated from last modification time.
> ---
>
> Key: HBASE-4591
> URL: https://issues.apache.org/jira/browse/HBASE-4591
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.89.20100621
>Reporter: Madhuwanti Vaidya
>Assignee: Madhuwanti Vaidya
>Priority: Minor
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4078) Silent Data Offlining During HDFS Flakiness

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127281#comment-13127281
 ] 

Jonathan Gray commented on HBASE-4078:
--

This seems to have somehow broken cache-on-write again.  I think because the 
verify does a closeReader() which could trigger the evict-on-close.

I'm going to need to extend the close API to take evictOnClose as an argument.  
I think there's actually a JIRA for this already.

> Silent Data Offlining During HDFS Flakiness
> ---
>
> Key: HBASE-4078
> URL: https://issues.apache.org/jira/browse/HBASE-4078
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.89.20100924, 0.90.3, 0.92.0
>Reporter: Nicolas Spiegelberg
>Assignee: Pritam Damania
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 
> 0001-Validate-store-files-after-compactions-flushes.patch, 
> 0001-Validate-store-files.patch
>
>
> See HBASE-1436 .  The bug fix for this JIRA is a temporary workaround for 
> improperly moving partially-written files from TMP into the region directory 
> when a FS error occurs.  Unfortunately, the fix is to ignore all IO 
> exceptions, which masks off-lining due to FS flakiness.  We need to 
> permanently fix the problem that created HBASE-1436 & then at least have the 
> option to not open a region during times of flakey FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127122#comment-13127122
 ] 

Jonathan Gray commented on HBASE-4335:
--

@LarsH, in the future, please have your svn commit message be in the same 
format as the CHANGES.txt update (ie. HBASE-  The title description (author 
[via committer])

> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335-v3.txt, 4335-v4.txt, 4335-v5.txt, 
> 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127090#comment-13127090
 ] 

Jonathan Gray commented on HBASE-4469:
--

(i'm not putting in 92 branch because this is feature)

> Avoid top row seek by looking up bloomfilter
> 
>
> Key: HBASE-4469
> URL: https://issues.apache.org/jira/browse/HBASE-4469
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to 
> top of the row in order to check for row delete marker (delete family). 
> However, if the bloomfilter is enabled for the column family, then if a 
> delete family operation is done on a row, the row is already being added to 
> bloomfilter. We can take advantage of this factor to avoid seeking to the top 
> of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127089#comment-13127089
 ] 

Jonathan Gray commented on HBASE-4469:
--

What is the protocol now?  This needs to go into the fb-89 branch, so do I keep 
this JIRA open until that happens, or should we just add some fb-89-pending tag 
or something?

> Avoid top row seek by looking up bloomfilter
> 
>
> Key: HBASE-4469
> URL: https://issues.apache.org/jira/browse/HBASE-4469
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to 
> top of the row in order to check for row delete marker (delete family). 
> However, if the bloomfilter is enabled for the column family, then if a 
> delete family operation is done on a row, the row is already being added to 
> bloomfilter. We can take advantage of this factor to avoid seeking to the top 
> of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127071#comment-13127071
 ] 

Jonathan Gray commented on HBASE-4469:
--

Thanks Liyin.  Unfortunately because the RB integration isn't very tight, to 
follow Apache protocol, you need to attach the patch to the JIRA and select the 
radio button that assigns it to apache.

This also helps to ensure that there's no confusion about which version was 
committed and that we don't have a hard dependency on RB in any way.

It'll all be second nature before you know it :)

> Avoid top row seek by looking up bloomfilter
> 
>
> Key: HBASE-4469
> URL: https://issues.apache.org/jira/browse/HBASE-4469
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to 
> top of the row in order to check for row delete marker (delete family). 
> However, if the bloomfilter is enabled for the column family, then if a 
> delete family operation is done on a row, the row is already being added to 
> bloomfilter. We can take advantage of this factor to avoid seeking to the top 
> of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127059#comment-13127059
 ] 

Jonathan Gray commented on HBASE-4469:
--

Liyin, can you post the final patch to this JIRA?  I will commit.  Thanks!

> Avoid top row seek by looking up bloomfilter
> 
>
> Key: HBASE-4469
> URL: https://issues.apache.org/jira/browse/HBASE-4469
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to 
> top of the row in order to check for row delete marker (delete family). 
> However, if the bloomfilter is enabled for the column family, then if a 
> delete family operation is done on a row, the row is already being added to 
> bloomfilter. We can take advantage of this factor to avoid seeking to the top 
> of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127054#comment-13127054
 ] 

Jonathan Gray commented on HBASE-3417:
--

In StoreFile.java:
{code}
   private static final Pattern REF_NAME_PARSER =
-Pattern.compile("^(\\d+)(?:\\.(.+))?$");
+Pattern.compile("^([0-9a-f]+)(?:\\.(.+))?$");
{code}

If you ever need to go backwards from 92 to a previous version.

> CacheOnWrite is using the temporary output path for block names, need to use 
> a more consistent block naming scheme
> --
>
> Key: HBASE-3417
> URL: https://issues.apache.org/jira/browse/HBASE-3417
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.92.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
> HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the 
> filesystem path.  However, for cache on write, the path is a temporary output 
> file.
> The original COW patch actually made some modifications to block naming stuff 
> to make it more consistent but did not do enough.  Should add a separate 
> method somewhere for generating block names using some more easily mocked 
> scheme (rather than just raw path as we generate a random unique file name 
> twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127051#comment-13127051
 ] 

Jonathan Gray commented on HBASE-3417:
--

I didn't mark as incompatible but it is only one-way compatible.

There is actually a very trivial change that can be made in the 0.90 branch (or 
any other branches) to make this change compatible in all directions.  Just 
need to update the REF_NAME_PARSER regex to be what it is in this change 
(tolerant of [a-f] in addition to digits).  That's it.

> CacheOnWrite is using the temporary output path for block names, need to use 
> a more consistent block naming scheme
> --
>
> Key: HBASE-3417
> URL: https://issues.apache.org/jira/browse/HBASE-3417
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.92.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
> HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the 
> filesystem path.  However, for cache on write, the path is a temporary output 
> file.
> The original COW patch actually made some modifications to block naming stuff 
> to make it more consistent but did not do enough.  Should add a separate 
> method somewhere for generating block names using some more easily mocked 
> scheme (rather than just raw path as we generate a random unique file name 
> twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127018#comment-13127018
 ] 

Jonathan Gray commented on HBASE-3417:
--

Stack, I was going to open a new JIRA, but it is the exact same issue and a 
nearly identical patch (primary difference is pre/post hfile v2).  It was just 
incorrect to close this following commit of hfile v2 which was unrelated to 
this bug.  Nothing was ever committed under this JIRA so just reopened with an 
updated patch.

I think things get confusing when there is more than one commit per branch per 
jira.  We should probably ban that practice.  Or at least institute some kind 
of standardized commit message (HBASE-3417, HBASE-3417-B, HBASE-3417-C, etc) or 
some such thing.

> CacheOnWrite is using the temporary output path for block names, need to use 
> a more consistent block naming scheme
> --
>
> Key: HBASE-3417
> URL: https://issues.apache.org/jira/browse/HBASE-3417
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.92.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
> HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the 
> filesystem path.  However, for cache on write, the path is a temporary output 
> file.
> The original COW patch actually made some modifications to block naming stuff 
> to make it more consistent but did not do enough.  Should add a separate 
> method somewhere for generating block names using some more easily mocked 
> scheme (rather than just raw path as we generate a random unique file name 
> twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126854#comment-13126854
 ] 

Jonathan Gray commented on HBASE-4459:
--

- Why is Queue added within the scope of this JIRA?  Seems unrelated.

- Can you remove the unnecessary import re-org at the top?

- Can we have a unit test which shows the backwards compatibility of this?

Thanks for working on this Ram.

> HbaseObjectWritable code is a byte, we will eventually run out of codes
> ---
>
> Key: HBASE-4459
> URL: https://issues.apache.org/jira/browse/HBASE-4459
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Reporter: Jonathan Gray
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4459.txt
>
>
> There are about 90 classes/codes in HbaseObjectWritable currently and 
> Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
> not break compatibility might want to leave a gap before using codes and 
> that's difficult in such limited space.
> Eventually we should get rid of this pattern that makes compatibility 
> difficult (better client/server protocol handshake) but we should probably at 
> least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4558) Refactor TestOpenedRegionHandler and TestOpenRegionHandler.

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126373#comment-13126373
 ] 

Jonathan Gray commented on HBASE-4558:
--

-  metaRegion, regionServer);
+  metaRegion, regionServer.getServerName());

?

> Refactor TestOpenedRegionHandler and TestOpenRegionHandler.
> ---
>
> Key: HBASE-4558
> URL: https://issues.apache.org/jira/browse/HBASE-4558
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4558_1.patch, HBASE-4558_2.patch, 
> HBASE-4558_3.patch
>
>
> This is an improvement task taken up to refactor TestOpenedRegionandler and 
> TestOpenRegionHandler so that MockServer and MockRegionServerServices can be 
> accessed from a common utility package.
> If we do this then one of the testcases in TestOpenedRegionHandler need not 
> start up a cluster and also moving it into a common package will help in 
> mocking the server for future testcases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4558) Refactor TestOpenedRegionHandler and TestOpenRegionHandler.

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126372#comment-13126372
 ] 

Jonathan Gray commented on HBASE-4558:
--

Did this break the build?  TestMasterFailover is not compiling for me.

> Refactor TestOpenedRegionHandler and TestOpenRegionHandler.
> ---
>
> Key: HBASE-4558
> URL: https://issues.apache.org/jira/browse/HBASE-4558
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4558_1.patch, HBASE-4558_2.patch, 
> HBASE-4558_3.patch
>
>
> This is an improvement task taken up to refactor TestOpenedRegionandler and 
> TestOpenRegionHandler so that MockServer and MockRegionServerServices can be 
> accessed from a common utility package.
> If we do this then one of the testcases in TestOpenedRegionHandler need not 
> start up a cluster and also moving it into a common package will help in 
> mocking the server for future testcases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126332#comment-13126332
 ] 

Jonathan Gray commented on HBASE-4459:
--

I'm fine with pulling into 0.92 since it doesn't break any compatibility.

> HbaseObjectWritable code is a byte, we will eventually run out of codes
> ---
>
> Key: HBASE-4459
> URL: https://issues.apache.org/jira/browse/HBASE-4459
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Reporter: Jonathan Gray
>Priority: Critical
> Fix For: 0.92.0
>
>
> There are about 90 classes/codes in HbaseObjectWritable currently and 
> Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
> not break compatibility might want to leave a gap before using codes and 
> that's difficult in such limited space.
> Eventually we should get rid of this pattern that makes compatibility 
> difficult (better client/server protocol handshake) but we should probably at 
> least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1621) merge tool should work on online cluster, but disabled table

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126288#comment-13126288
 ] 

Jonathan Gray commented on HBASE-1621:
--

Punt to 0.92.1 or 0.94.0?

> merge tool should work on online cluster, but disabled table
> 
>
> Key: HBASE-1621
> URL: https://issues.apache.org/jira/browse/HBASE-1621
> Project: HBase
>  Issue Type: Bug
>Reporter: ryan rawson
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 1621-trunk.txt, HBASE-1621-v2.patch, HBASE-1621.patch, 
> hbase-onlinemerge.patch, online_merge.rb
>
>
> taking down the entire cluster to merge 2 regions is a pain, i dont see why 
> the table or regions specifically couldnt be taken offline, then merged then 
> brought back up.
> this might need a new API to the regionservers so they can take direction 
> from not just the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126286#comment-13126286
 ] 

Jonathan Gray commented on HBASE-4410:
--

My comeback is that Lars is right and I f-ed it up.  I was supposed to make a 
new patch but forgot about this.  I was a bit angry I came up with such a nice 
elegant solution that was fundamentally broken.  ;)

Will try to get to this next week.

> FilterList.filterKeyValue can return suboptimal ReturnCodes
> ---
>
> Key: HBASE-4410
> URL: https://issues.apache.org/jira/browse/HBASE-4410
> Project: HBase
>  Issue Type: Improvement
>  Components: filters
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4410-v1.patch
>
>
> FilterList.filterKeyValue does not always return the most optimal ReturnCode 
> in both the AND and OR conditions.
> For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
> the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
> SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
> ReturnCode from F2.
> For AND conditions, we can always pick the *most restrictive* return code.
> For OR conditions, we must always pick the *least restrictive* return code.
> This JIRA is to review the FilterList.filterKeyValue() method to try and make 
> it more optimal and to add a new unit test which verifies the correct 
> behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126282#comment-13126282
 ] 

Jonathan Gray commented on HBASE-4489:
--

Historically ASCII has proven a bad choice in key design.  If it's always fixed 
length, it's less of a big deal and really does come down to space savings vs. 
readability.  In many applications, row keys are composite keys made up of many 
different things.  Often times, the key may be preceded by some fixed-length 
random hash of some sort.

I almost always want to be building these composite keys from fixed-length 
binary ints/longs and such, rather than fixed-length ascii characters.

If we are talking a straightforward key-val situation with a string-like key, 
then the usability of ASCII would make sense.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, 
> HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, 
> HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, 
> HBASE-4489-trunk-v3.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126224#comment-13126224
 ] 

Jonathan Gray commented on HBASE-4583:
--

We likely won't be able to do in-place modifications or direct KV removal from 
MemStore.  A simple way would be to also introduce a delete marker that removes 
the previous value, but the marker will have the rwcc of the new edit, so 
you'll have the right consistency.

This will lead to a build up of unnecessary KVs in the MemStore.  Periodically 
cleaning that up would be possible but unnecessarily complex I think.

Another option would be to remove the previous KVs after you roll rwcc forward 
and release the row lock, before dropping the region-level lock.  Should 
definitely be possible.  Will obviously require a remangling of upsert but it's 
kinda dirty anyways.

> Integrate RWCC with Append and Increment operations
> ---
>
> Key: HBASE-4583
> URL: https://issues.apache.org/jira/browse/HBASE-4583
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Fix For: 0.94.0
>
>
> Currently Increment and Append operations do not work with RWCC and hence a 
> client could see the results of multiple such operation mixed in the same 
> Get/Scan.
> The semantics might be a bit more interesting here as upsert adds and removes 
> to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126175#comment-13126175
 ] 

Jonathan Gray commented on HBASE-4469:
--

@stack, yeah, this version only work if you have rowcol blooms enabled.  The 
generic version is going to be implemented over in HBASE-4532.

> Avoid top row seek by looking up bloomfilter
> 
>
> Key: HBASE-4469
> URL: https://issues.apache.org/jira/browse/HBASE-4469
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to 
> top of the row in order to check for row delete marker (delete family). 
> However, if the bloomfilter is enabled for the column family, then if a 
> delete family operation is done on a row, the row is already being added to 
> bloomfilter. We can take advantage of this factor to avoid seeking to the top 
> of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new en

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126141#comment-13126141
 ] 

Jonathan Gray commented on HBASE-4102:
--

I think unifying Put and Append is not support important.  It would be good to 
unify Increment and Append, maybe even CheckAndPut/Delete?  A generic atomic op 
thing.

For the attributes, I think we just need a convention for system attributes, 
for example, they are preceded by an _ underscore.  And then we can put all the 
used attributes into HConstants for easy tracking.

Let's open another JIRA to integrate RWCC w/ Append and possibly Increment as 
well.  We can discuss there.

> atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
> current value then adds the bytes offered by the client to the tail and 
> writes out a new entry
> ---
>
> Key: HBASE-4102
> URL: https://issues.apache.org/jira/browse/HBASE-4102
> Project: HBase
>  Issue Type: New Feature
>Reporter: stack
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 4102-v1.txt, 4102.txt
>
>
> Its come up a few times that clients want to add to an existing cell rather 
> than make a new cell each time.  At our place, the frontend keeps a list of 
> urls a user has visited -- their md5s -- and updates it as user progresses.  
> Rather than read, modify client-side, then write new value back to hbase, it 
> would be sweet if could do it all in one operation in hbase server.  TSDB 
> aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
> metric, it would rather have a KV for an interval an in this KV have a value 
> that is all the metrics for the period.
> It could be done as a coprocessor but this feels more like a fundamental 
> feature.
> Benoît suggests that atomicAppend take a flag to indicate whether or not the 
> client wants to see the resulting cell; often a client won't want to see the 
> result and in this case, why pay the price formulating and delivering a 
> response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new en

2011-10-11 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125225#comment-13125225
 ] 

Jonathan Gray commented on HBASE-4102:
--

This is really nice Lars.  I'd love to see integration with RWCC and to somehow 
unify the code with Increment.  But I'm okay with committing this and filing a 
follow-up JIRA.

I'm also going to backport this into my local 92 branch but I think it should 
only be committed to trunk.  Let's put all the polish on before putting it in 
an official release.

Nice work!

> atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
> current value then adds the bytes offered by the client to the tail and 
> writes out a new entry
> ---
>
> Key: HBASE-4102
> URL: https://issues.apache.org/jira/browse/HBASE-4102
> Project: HBase
>  Issue Type: New Feature
>Reporter: stack
>Assignee: Lars Hofhansl
> Attachments: 4102-v1.txt, 4102.txt
>
>
> Its come up a few times that clients want to add to an existing cell rather 
> than make a new cell each time.  At our place, the frontend keeps a list of 
> urls a user has visited -- their md5s -- and updates it as user progresses.  
> Rather than read, modify client-side, then write new value back to hbase, it 
> would be sweet if could do it all in one operation in hbase server.  TSDB 
> aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
> metric, it would rather have a KV for an interval an in this KV have a value 
> that is all the metrics for the period.
> It could be done as a coprocessor but this feels more like a fundamental 
> feature.
> Benoît suggests that atomicAppend take a flag to indicate whether or not the 
> client wants to see the resulting cell; often a client won't want to see the 
> result and in this case, why pay the price formulating and delivering a 
> response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4556) Fix all incorrect uses of InternalScanner.next(...)

2011-10-10 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124349#comment-13124349
 ] 

Jonathan Gray commented on HBASE-4556:
--

Why do we not see bugs because of this?  Should the contract be how we actually 
use it since it seems to work?

> Fix all incorrect uses of InternalScanner.next(...)
> ---
>
> Key: HBASE-4556
> URL: https://issues.apache.org/jira/browse/HBASE-4556
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>
> There are cases all over the code where InternalScanner.next(...) is not used 
> correctly.
> I see this a lot:
> {code}
> while(scanner.next(...)) {
> }
> {code}
> The correct pattern is:
> {code}
> boolean more = false;
> do {
>more = scanner.next(...);
> } while (more);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122436#comment-13122436
 ] 

Jonathan Gray commented on HBASE-4528:
--

Dhruba and I just talked about this.  I also like the MemStore rollback.  It 
should not be that difficult, just removing the List that we added.

> The put operation can release the rowlock before sync-ing the Hlog
> --
>
> Key: HBASE-4528
> URL: https://issues.apache.org/jira/browse/HBASE-4528
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
> appendNoSyncPut3.txt
>
>
> This allows for better throughput when there are hot rows. A single row 
> update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4549) Add thrift API to read version and build date of HBase

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122373#comment-13122373
 ] 

Jonathan Gray commented on HBASE-4549:
--

+1

> Add thrift API to read version and build date of HBase 
> ---
>
> Key: HBASE-4549
> URL: https://issues.apache.org/jira/browse/HBASE-4549
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Song Liu
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Adding API to get the hbase server version and build date will be helpful for 
> the client to communicate with different versions of the server accordingly. 
> class VersionInfo can be reused to provide required information. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122370#comment-13122370
 ] 

Jonathan Gray commented on HBASE-4547:
--

Post-commit +1.

Stack, should we open another JIRA to deal with your TODO?

> TestAdmin failing in 0.92 because .tableinfo not found
> --
>
> Key: HBASE-4547
> URL: https://issues.apache.org/jira/browse/HBASE-4547
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4547.txt
>
>
> I've been running tests before commit and found the following happens with 
> some regularity, sporadic of course, but they fail fairly frequently:
> {code}
> Failed tests:   
> testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin)
>   testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but 
> was:<1>
>   testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): 
> expected:<2> but was:<1>
> {code}
> Looking, it seems like we fail to find .tableinfo in the tests that modify 
> table schema while table is online.
> The update of a table schema just does an overwrite.  In the tests we 
> sometimes fail to find the newly written file or we get EOFE reading it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122311#comment-13122311
 ] 

Jonathan Gray commented on HBASE-4536:
--

Lars, I agree that this is an important feature.  Also agree that we should 
take time and do it right and not push for 0.92.

Could we just support some kind of "raw scanner" along with a TTKAKV config 
(Time To Keep All Key Values)?

> Allow CF to retain deleted rows
> ---
>
> Key: HBASE-4536
> URL: https://issues.apache.org/jira/browse/HBASE-4536
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
> of versions.
> However, if a client deletes a row all version older than the delete tomb 
> stone will be remove at the next major compaction (and even at memstore flush 
> - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
> versions even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
> (from a user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122034#comment-13122034
 ] 

Jonathan Gray commented on HBASE-4482:
--

+1 on keeping this in 0.92 regardless of stability and marking as experimental.

> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121508#comment-13121508
 ] 

Jonathan Gray commented on HBASE-4465:
--

Nice work Liyin and Mikhail!

> Lazy-seek optimization for StoreFile scanners
> -
>
> Key: HBASE-4465
> URL: https://issues.apache.org/jira/browse/HBASE-4465
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>  Labels: optimization, seek
> Fix For: 0.89.20100924, 0.94.0
>
> Attachments: 
> HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch
>
>
> Previously, if we had several StoreFiles for a column family in a region, we 
> would seek in each of them and only then merge the results, even though the 
> row/column we are looking for might only be in the most recent (and the 
> smallest) file. Now we prioritize our reads from those files so that we check 
> the most recent file first. This is done by doing a "lazy seek" which 
> pretends that the next value in the StoreFile is (seekRow, seekColumn, 
> lastTimestampInStoreFile), which is earlier in the KV order than anything 
> that might actually occur in the file. So if we don't find the result in 
> earlier files, that fake KV will bubble up to the top of the KV heap and a 
> real seek will be done. This is expected to significantly reduce the amount 
> of disk IO (as of 09/22/2011 we are doing dark launch testing and 
> measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful 
> discussions on this and the idea of putting fake KVs with the highest 
> timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121482#comment-13121482
 ] 

Jonathan Gray commented on HBASE-4465:
--

Committed to trunk.  What's the status on the 89 branch?  Should we keep this 
open?

> Lazy-seek optimization for StoreFile scanners
> -
>
> Key: HBASE-4465
> URL: https://issues.apache.org/jira/browse/HBASE-4465
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>  Labels: optimization, seek
> Fix For: 0.89.20100924, 0.94.0
>
> Attachments: 
> HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch
>
>
> Previously, if we had several StoreFiles for a column family in a region, we 
> would seek in each of them and only then merge the results, even though the 
> row/column we are looking for might only be in the most recent (and the 
> smallest) file. Now we prioritize our reads from those files so that we check 
> the most recent file first. This is done by doing a "lazy seek" which 
> pretends that the next value in the StoreFile is (seekRow, seekColumn, 
> lastTimestampInStoreFile), which is earlier in the KV order than anything 
> that might actually occur in the file. So if we don't find the result in 
> earlier files, that fake KV will bubble up to the top of the KV heap and a 
> real seek will be done. This is expected to significantly reduce the amount 
> of disk IO (as of 09/22/2011 we are doing dark launch testing and 
> measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful 
> discussions on this and the idea of putting fake KVs with the highest 
> timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4544) Rename RWCC to MVCC

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121467#comment-13121467
 ] 

Jonathan Gray commented on HBASE-4544:
--

Nice!

Do you want to get this in before/after HBASE-2856?

> Rename RWCC to MVCC
> ---
>
> Key: HBASE-4544
> URL: https://issues.apache.org/jira/browse/HBASE-4544
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
> Fix For: 0.94.0
>
> Attachments: 4544-v1.txt
>
>
> ReadWriteConcurrencyControl should be called MultiVersionConcurrencyControl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121461#comment-13121461
 ] 

Jonathan Gray commented on HBASE-4465:
--

Please attach the final patch to JIRA.

> Lazy-seek optimization for StoreFile scanners
> -
>
> Key: HBASE-4465
> URL: https://issues.apache.org/jira/browse/HBASE-4465
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>  Labels: optimization, seek
> Fix For: 0.89.20100924, 0.94.0
>
>
> Previously, if we had several StoreFiles for a column family in a region, we 
> would seek in each of them and only then merge the results, even though the 
> row/column we are looking for might only be in the most recent (and the 
> smallest) file. Now we prioritize our reads from those files so that we check 
> the most recent file first. This is done by doing a "lazy seek" which 
> pretends that the next value in the StoreFile is (seekRow, seekColumn, 
> lastTimestampInStoreFile), which is earlier in the KV order than anything 
> that might actually occur in the file. So if we don't find the result in 
> earlier files, that fake KV will bubble up to the top of the KV heap and a 
> real seek will be done. This is expected to significantly reduce the amount 
> of disk IO (as of 09/22/2011 we are doing dark launch testing and 
> measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful 
> discussions on this and the idea of putting fake KVs with the highest 
> timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121384#comment-13121384
 ] 

Jonathan Gray commented on HBASE-4540:
--

Looks pretty good.  Once you get the unit tests passing, want to put it up on 
RB?

Also, it'd be really good if you could start thinking about how to mock these 
scenarios better in our unit tests.  You are finding lots of great bugs but 
without tests it will be hard to prevent regressions.

> OpenedRegionHandler is not enforcing atomicity of the operation it is 
> performing
> 
>
> Key: HBASE-4540
> URL: https://issues.apache.org/jira/browse/HBASE-4540
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4540_1.patch
>
>
> -> OpenedRegionHandler has not yet deleted the znode of the region R1 opened 
> by RS1.
> -> RS1 goes down.
> -> Servershutdownhandler assigns the region R1 to RS2.
> -> The znode of R1 is moved to OFFLINE state by master or OPENING state by 
> RS2 if RS2 has started opening the region.
> -> Now the first OpenedRegionHandler tries to delete the znode thinking its 
> in OPENED state but fails.
> -> Though it fails it removes the node from RIT and adds RS1 as the owner of 
> R1 in master's memory.
> -> Now when RS2 completes opening the region the master is not able to open 
> the region as already the reigon has been deleted from RIT.
> {code}
> Master
> ==
> 2011-10-05 20:49:45,301 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
> processing of shutdown of linux146,60020,1317827727647
> 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because 1 region(s) in transition: 
> {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9.
>  state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847}
> 2011-10-05 20:49:57,720 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, 
> region=3e69d628a8bd8e9b7c5e7a2a6e03aad9
> 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x132d3dc13090023 Deleting existing unassigned node for 
> 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED
> 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x132d3dc13090023 Attempting to delete unassigned node 
> 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in 
> RS_ZK_REGION_OPENING state
> After the region is opened in RS2
> =
> 2011-10-05 20:50:48,066 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, 
> region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late
> 2011-10-05 20:50:48,290 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 
> 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but 
> region was in  the state null and not in expected PENDING_OPEN or OPENING 
> states
> 2011-10-05 20:50:53,743 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, 
> region=3e69d628a8bd8e9b7c5e7a2a6e03aad9
> 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: 
> Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s)
> 2011-10-05 20:50:54,397 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 
> 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but 
> region was in  the state null and not in expected PENDING_OPEN or OPENING 
> states
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121139#comment-13121139
 ] 

Jonathan Gray commented on HBASE-4536:
--

This changes default behavior now?  I disagree that expected behavior is to 
ever uncover previously deleted data.  I'm okay with this as an option.


> Allow CF to retain deleted rows
> ---
>
> Key: HBASE-4536
> URL: https://issues.apache.org/jira/browse/HBASE-4536
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0, 0.94.0
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
> of versions.
> However, if a client deletes a row all version older than the delete tomb 
> stone will be remove at the next major compaction (and even at memstore flush 
> - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
> versions even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
> (from a user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120711#comment-13120711
 ] 

Jonathan Gray commented on HBASE-3446:
--

I've grasped most of the change and this is clearly a significant improvement.  
Let's get it in!

+1 on latest patch up on RB if tests are passing.  TestMergeTool also fails on 
occasion for me.

Nice work stack!

You're thinking CatalogTracker follow-up in 0.94 w/ ROOT removal perhaps?

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> 
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.0
>Reporter: Todd Lipcon
>Assignee: stack
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 
> 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 
> 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and 
> afterwards had LOTS of regions left orphaned. The issue appears to be that 
> ProcessServerShutdown failed because the server hosting META was restarted 
> around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120692#comment-13120692
 ] 

Jonathan Gray commented on HBASE-4422:
--

I have looked at 3446 more.  I'm happy with it and confident it makes things 
better.  Will give the +1.

Re: getting the cache instance from CacheConf, i'm open to other designs, but 
this seems best in that we only need one argument for all the caching stuff vs. 
a separate reference for the cache itself.  What did you have in mind?

Maybe CacheConfig + BlockCache should be somehow combined?  Dunno.

> Move block cache parameters and references into single CacheConf class
> --
>
> Key: HBASE-4422
> URL: https://issues.apache.org/jira/browse/HBASE-4422
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.92.0
>
> Attachments: CacheConfig92-v8.patch
>
>
> From StoreFile down to HFile, we currently use a boolean argument for each of 
> the various block cache configuration parameters that exist.  The number of 
> parameters is going to continue to increase as we look at compressed cache, 
> delta encoding, and more specific L1/L2 configuration.  Every new config 
> currently requires changing many constructors because it introduces a new 
> boolean.
> We should move everything into a single class so that modifications are much 
> less disruptive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120693#comment-13120693
 ] 

Jonathan Gray commented on HBASE-4422:
--

I have looked at 3446 more.  I'm happy with it and confident it makes things 
better.  Will give the +1.

Re: getting the cache instance from CacheConf, i'm open to other designs, but 
this seems best in that we only need one argument for all the caching stuff vs. 
a separate reference for the cache itself.  What did you have in mind?

Maybe CacheConfig + BlockCache should be somehow combined?  Dunno.

> Move block cache parameters and references into single CacheConf class
> --
>
> Key: HBASE-4422
> URL: https://issues.apache.org/jira/browse/HBASE-4422
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.92.0
>
> Attachments: CacheConfig92-v8.patch
>
>
> From StoreFile down to HFile, we currently use a boolean argument for each of 
> the various block cache configuration parameters that exist.  The number of 
> parameters is going to continue to increase as we look at compressed cache, 
> delta encoding, and more specific L1/L2 configuration.  Every new config 
> currently requires changing many constructors because it introduces a new 
> boolean.
> We should move everything into a single class so that modifications are much 
> less disruptive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120637#comment-13120637
 ] 

Jonathan Gray commented on HBASE-4422:
--

Yeah, if this goes to trunk but not 92 then begins the fun of rebasing patches 
for each because it changes so many constructors in/around HFile.

> Move block cache parameters and references into single CacheConf class
> --
>
> Key: HBASE-4422
> URL: https://issues.apache.org/jira/browse/HBASE-4422
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.92.0
>
> Attachments: CacheConfig92-v8.patch
>
>
> From StoreFile down to HFile, we currently use a boolean argument for each of 
> the various block cache configuration parameters that exist.  The number of 
> parameters is going to continue to increase as we look at compressed cache, 
> delta encoding, and more specific L1/L2 configuration.  Every new config 
> currently requires changing many constructors because it introduces a new 
> boolean.
> We should move everything into a single class so that modifications are much 
> less disruptive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4534) A new unit test for lazy seek and StoreScanner in general

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120400#comment-13120400
 ] 

Jonathan Gray commented on HBASE-4534:
--

Are we thinking these read optimizations are going to only go into 0.94?  
(Seems reasonable to me, but I will be pulling them into our internal 92 branch)

> A new unit test for lazy seek and StoreScanner in general
> -
>
> Key: HBASE-4534
> URL: https://issues.apache.org/jira/browse/HBASE-4534
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>
> A randomized unit test for Gets/Scans (all-row, single-row, multi-row, 
> all-column, single-column, and multi-column). Also all combinations of Bloom 
> filters and compression (NONE vs GZIP) are tested. The unit test flushes 
> multiple StoreFiles with disjoint timestamp ranges and runs various types of 
> queries against them. Currently we are not testing overlapping timestamp 
> ranges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4534) A new unit test for lazy seek and StoreScanner in general

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119869#comment-13119869
 ] 

Jonathan Gray commented on HBASE-4534:
--

LUCENE-3408 is just a wrapper around two counter implementations.  One is 
thread-safe and uses an AtomicLong, the other is not and uses a long.  It looks 
like they were just trying to improve performance when the counter was being 
used in a single thread.

+1 that we should deal with changing AtomicLong to something else in another 
jira.

> A new unit test for lazy seek and StoreScanner in general
> -
>
> Key: HBASE-4534
> URL: https://issues.apache.org/jira/browse/HBASE-4534
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>
> A randomized unit test for Gets/Scans (all-row, single-row, multi-row, 
> all-column, single-column, and multi-column). Also all combinations of Bloom 
> filters and compression (NONE vs GZIP) are tested. The unit test flushes 
> multiple StoreFiles with disjoint timestamp ranges and runs various types of 
> queries against them. Currently we are not testing overlapping timestamp 
> ranges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119754#comment-13119754
 ] 

Jonathan Gray commented on HBASE-4536:
--

We were also discussion today that there are situations (especially in 
multi-master situations) where you want to retain the delete markers for some 
period of time as well.

I would think this would require both a family-level setting (a new one or the 
existing one) and also a read-time option, correct?  As of now, deletes are 
never returned to the client.  You'd have to return them in this case otherwise 
the user would have no idea what is actually there?  I'm not sure it's fair to 
ask a user to understand how our delete tombstones work :)

> Allow CF to retain deleted rows
> ---
>
> Key: HBASE-4536
> URL: https://issues.apache.org/jira/browse/HBASE-4536
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0, 0.94.0
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
> of versions.
> However, if a client deletes a row all version older than the delete tomb 
> stone will be remove at the next major compaction (and even at memstore flush 
> - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
> versions even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
> (from a user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119748#comment-13119748
 ] 

Jonathan Gray commented on HBASE-4532:
--

Whoo!  +1

> Avoid top row seek by dedicated bloom filter for delete family
> --
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
> enabled. 
> This jira tries to avoid top row seek for all the cases by creating a 
> dedicated bloom filter only for delete family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4527) Fix versioning such that every update is unique

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119756#comment-13119756
 ] 

Jonathan Gray commented on HBASE-4527:
--

Agreed.

> Fix versioning such that every update is unique
> ---
>
> Key: HBASE-4527
> URL: https://issues.apache.org/jira/browse/HBASE-4527
> Project: HBase
>  Issue Type: Wish
>Reporter: stack
>
> I wanted to use checkAndPut but there is a case where the check will not fail 
> though the cell has been updated: if a cell is update with exactly the value 
> it had before, we'll not know its been changed.  hbase-4507 did a checkAndPut 
> where you could pass a timestamp as part of the check so we'd check the cell 
> value AND that the timestamp was the same.
> This would work in most regards but one; an update is done in the same 
> millisecond.  This is generally impossible but in a distributed system where 
> clocks drift and a region can be moved to a server whose clock is retarded, 
> it is within the realm of possibilities that it could happen.  So we should 
> deal.
> One thought is that the version is made for sure unique.  We could make the 
> timestamp wider still so probability of the edits arriving within the same 
> microsecond -- or whatever it is that a double gives you -- would require us 
> to run through a couple of billion universe expand/contract cycles or we 
> could have a monotonically increasing sequence id per millisecond.
> There could be some overlap between this issue and the persisting of rwcc to 
> the filesystem (though not currently as rwcc is implemented).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4533) ops_mgt.xml - tweaks to backup section

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119750#comment-13119750
 ] 

Jonathan Gray commented on HBASE-4533:
--

Doug, do you think you could add [book] to the front of your commits or 
something?  I'm doing a lot of repository management and that'd be super 
helpful :)

> ops_mgt.xml - tweaks to backup section
> --
>
> Key: HBASE-4533
> URL: https://issues.apache.org/jira/browse/HBASE-4533
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: ops_mgt_HBASE_4533.xml.patch, 
> ops_mgt_HBASE_4533_v2.xml.patch
>
>
> Minor tweaks to backup section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4527) Fix versioning such that every update is unique

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119747#comment-13119747
 ] 

Jonathan Gray commented on HBASE-4527:
--

Yeah, seems like we could utilize the memstoreTS once we solve those issues and 
then we don't care about the timestamp being the same.  But we'd then have to 
expose RWCC to the API because checkAndPut would need to specify it?  ugh

> Fix versioning such that every update is unique
> ---
>
> Key: HBASE-4527
> URL: https://issues.apache.org/jira/browse/HBASE-4527
> Project: HBase
>  Issue Type: Wish
>Reporter: stack
>
> I wanted to use checkAndPut but there is a case where the check will not fail 
> though the cell has been updated: if a cell is update with exactly the value 
> it had before, we'll not know its been changed.  hbase-4507 did a checkAndPut 
> where you could pass a timestamp as part of the check so we'd check the cell 
> value AND that the timestamp was the same.
> This would work in most regards but one; an update is done in the same 
> millisecond.  This is generally impossible but in a distributed system where 
> clocks drift and a region can be moved to a server whose clock is retarded, 
> it is within the realm of possibilities that it could happen.  So we should 
> deal.
> One thought is that the version is made for sure unique.  We could make the 
> timestamp wider still so probability of the edits arriving within the same 
> microsecond -- or whatever it is that a double gives you -- would require us 
> to run through a couple of billion universe expand/contract cycles or we 
> could have a monotonically increasing sequence id per millisecond.
> There could be some overlap between this issue and the persisting of rwcc to 
> the filesystem (though not currently as rwcc is implemented).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4524) support for more than one region in .META. table

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119745#comment-13119745
 ] 

Jonathan Gray commented on HBASE-4524:
--

+1 removing root and putting META location into zk.

-1 on thinking about splittable META right now.

+1 thinking about mirroring META in ZK, adding new locations/redirects in 
NSREs, and other optimizations and availability improvements

> support for more than one region in .META. table
> 
>
> Key: HBASE-4524
> URL: https://issues.apache.org/jira/browse/HBASE-4524
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> It seems there are some assumptions in the code that .META. table only has 
> one region FIRST_META_REGIONINFO in the following areas:
> 1) .META. table update with user region info.
> 2) .META. regions assignment.
> 3) .META. table split handling.
> Perhaps we don't have such requirement until we scale to really large number 
> of regions like 1M.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog

2011-09-30 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118311#comment-13118311
 ] 

Jonathan Gray commented on HBASE-4487:
--

+1 as well.  And agree with your assessment above, Stack.  Potential fatter 
grouping of increments and significant improvement of per-row throughput.

Looking forward to getting this working for Put/MultiPut!  Nice work, Dhruba.

> The increment operation can release the rowlock before sync-ing the Hlog
> 
>
> Key: HBASE-4487
> URL: https://issues.apache.org/jira/browse/HBASE-4487
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4487-v7.txt, appendNoSync4.txt, appendNoSync5.txt, 
> appendNoSync6.txt
>
>
> This allows for better throughput when there are hot rows.I have seen this 
> change make a single row update improve from 400 increments/sec/server to 
> 4000 increments/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4522) Make hbase-site-custom.xml override the hbase-site.xml

2011-09-30 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118275#comment-13118275
 ] 

Jonathan Gray commented on HBASE-4522:
--

Can't hbase-site import hbase-site-custom?

> Make hbase-site-custom.xml override the hbase-site.xml
> --
>
> Key: HBASE-4522
> URL: https://issues.apache.org/jira/browse/HBASE-4522
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Liyin Tang
>Priority: Minor
> Fix For: 0.94.0
>
>
> The motivation for diff is that we want to override some config change for 
> any specific cluster easily by just adding the config entries in the 
> hbase-site-custom.xml for that cluster. This change adds the 
> hbase-site-custom.xml configuration file into HBaseConfiguration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4521) Get the hadoop patch-submission build working for hbase

2011-09-30 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118245#comment-13118245
 ] 

Jonathan Gray commented on HBASE-4521:
--

Big +1

> Get the hadoop patch-submission build working for hbase
> ---
>
> Key: HBASE-4521
> URL: https://issues.apache.org/jira/browse/HBASE-4521
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>
> We need the facility over in hadoop where on 'patch submission', jenkins 
> tries the patch against current state of trunk.  We need this facility 
> because its a productivity killer expecting each dev vet the patch -- let 
> jenkins do it for us.  I'm trying to get Giri, the hadoop build fellow, to 
> help us set this up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117804#comment-13117804
 ] 

Jonathan Gray commented on HBASE-4496:
--

So this exact issue actually triggered why I was having a hard time getting 
TestCacheOnWrite to pass.  The test was previously relying on some 
broken/inconsistent behavior in which it passes a single instance of a reader 
with a null block cache but that was removed with the latest CacheConfig stuff.

My latest patch for HBASE-4422 actually just changes the always true to always 
false :)  I'm going to talk to Mikhail tomorrow (Friday) about the issue here 
and see if he has any thoughts.

> HFile V2 does not honor setCacheBlocks when scanning.
> -
>
> Key: HBASE-4496
> URL: https://issues.apache.org/jira/browse/HBASE-4496
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4496.txt
>
>
> While testing the LRU cache during the scanning I noticed quite some churn in 
> the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
> found that HFile V2 always caches blocks in the LRU cache regardless of the 
> cacheBlocks setting.
> Here's a trace (from Eclipse) showing the problem:
> HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
> HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
> HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
> HFileBlock) line: 191  
> HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
> HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
> StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
> StoreFileScanner.reseek(KeyValue) line: 110   
> KeyValueHeap.reseek(KeyValue) line: 255   
> StoreScanner.reseek(KeyValue) line: 409   
> StoreScanner.next(List, int) line: 304  
> KeyValueHeap.next(List, int) line: 114  
> KeyValueHeap.next(List) line: 143   
> HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
> HRegion$RegionScannerImpl.nextInternal(int) line: 2722
> HRegion$RegionScannerImpl.next(List, int) line: 2682
> HRegion$RegionScannerImpl.next(List) line: 2699 
> HRegionServer.next(long, int) line: 2092  
> Every scanner.next causes a reseek, which eventually causes a call to 
> HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
> cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
> HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
> The fix is not immediately clear, unless we want to pass cacheBlocks to 
> HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
> HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
> as readBlockData should not care about caching.
> Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117497#comment-13117497
 ] 

Jonathan Gray commented on HBASE-4477:
--

And it looks like the patch from this morning does exactly that.

I'm +1 on coprocessorPut1.txt.  Someone else want to review?

> Ability for an application to store metadata into the transaction log
> -
>
> Key: HBASE-4477
> URL: https://issues.apache.org/jira/browse/HBASE-4477
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: coprocessorPut1.txt, coprocessorPut2.txt, 
> hlogMetadata1.txt
>
>
> mySQL allows an application to store an arbitrary blob along with each 
> transaction in its transaction logs. This JIRA is to have a similar feature 
> request for HBASE.
> The use case is as follows: An application on one data center A stores a blob 
> of data along with each transaction. A replication software picks up these 
> blobs from the transaction logs in A and hands it to another instance of the 
> same application running on a remote data center B. The application in B is 
> responsible for applying this to the remote Hbase cluster (and also handle 
> conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117496#comment-13117496
 ] 

Jonathan Gray commented on HBASE-4477:
--

PutInfo seems overly generic but I agree that CPPutInfo is straight ugly.  And 
I keep thinking it says CPUInfo.

So Dhruba should just extend the API for now and we can introduce these new 
classes in a follow-up jira.

> Ability for an application to store metadata into the transaction log
> -
>
> Key: HBASE-4477
> URL: https://issues.apache.org/jira/browse/HBASE-4477
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: coprocessorPut1.txt, hlogMetadata1.txt
>
>
> mySQL allows an application to store an arbitrary blob along with each 
> transaction in its transaction logs. This JIRA is to have a similar feature 
> request for HBASE.
> The use case is as follows: An application on one data center A stores a blob 
> of data along with each transaction. A replication software picks up these 
> blobs from the transaction logs in A and hands it to another instance of the 
> same application running on a remote data center B. The application in B is 
> responsible for applying this to the remote Hbase cluster (and also handle 
> conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117449#comment-13117449
 ] 

Jonathan Gray commented on HBASE-4477:
--

+1 on CPPutInfo, CPGetInfo, etc...

> Ability for an application to store metadata into the transaction log
> -
>
> Key: HBASE-4477
> URL: https://issues.apache.org/jira/browse/HBASE-4477
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: coprocessorPut1.txt, hlogMetadata1.txt
>
>
> mySQL allows an application to store an arbitrary blob along with each 
> transaction in its transaction logs. This JIRA is to have a similar feature 
> request for HBASE.
> The use case is as follows: An application on one data center A stores a blob 
> of data along with each transaction. A replication software picks up these 
> blobs from the transaction logs in A and hands it to another instance of the 
> same application running on a remote data center B. The application in B is 
> responsible for applying this to the remote Hbase cluster (and also handle 
> conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117450#comment-13117450
 ] 

Jonathan Gray commented on HBASE-4477:
--

And yeah, maybe introduce CPPutInfo in this JIRA and open a follow-up to change 
the others

> Ability for an application to store metadata into the transaction log
> -
>
> Key: HBASE-4477
> URL: https://issues.apache.org/jira/browse/HBASE-4477
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: coprocessorPut1.txt, hlogMetadata1.txt
>
>
> mySQL allows an application to store an arbitrary blob along with each 
> transaction in its transaction logs. This JIRA is to have a similar feature 
> request for HBASE.
> The use case is as follows: An application on one data center A stores a blob 
> of data along with each transaction. A replication software picks up these 
> blobs from the transaction logs in A and hands it to another instance of the 
> same application running on a remote data center B. The application in B is 
> responsible for applying this to the remote Hbase cluster (and also handle 
> conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >