[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2012-03-21 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235031#comment-13235031
 ] 

Jonathan Gray commented on HBASE-4410:
--

Not working on this right now, punt it!  Thanks Lars

On Mar 21, 2012, at 3:31 PM, Lars Hofhansl (Updated) (JIRA)



 FilterList.filterKeyValue can return suboptimal ReturnCodes
 ---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-4410-v1.patch


 FilterList.filterKeyValue does not always return the most optimal ReturnCode 
 in both the AND and OR conditions.
 For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
 the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
 SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
 ReturnCode from F2.
 For AND conditions, we can always pick the *most restrictive* return code.
 For OR conditions, we must always pick the *least restrictive* return code.
 This JIRA is to review the FilterList.filterKeyValue() method to try and make 
 it more optimal and to add a new unit test which verifies the correct 
 behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3171) Drop ROOT and instead store META location(s) directly in ZooKeeper

2012-02-09 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204346#comment-13204346
 ] 

Jonathan Gray commented on HBASE-3171:
--

Thanks for taking a look.  Need to think about backward compatibility more.  
Might need to hold off until some future big client/server change?

Were you think that meta locations would be in both the META region(s) as well 
as up in ZK?  Or just in ZK?  If it was in both, then should be easier to 
provide backwards compatibility.

Which would be source of truth and which would be relied upon for persistence?  
I suppose all the data in meta is recoverable from the regions themselves (or 
should be) between restarts so we wouldn't have a hard requirement on zk 
persistence between restarts.  Doing the meta edits in zk might help suss out 
some of those trickier race conditions around region movement, splitting, meta 
updating, and crashing.

Was also thinking we should revisit the idea of more intelligent redirecting of 
clients along with NSREs while looking at this stuff.

 Drop ROOT and instead store META location(s) directly in ZooKeeper
 --

 Key: HBASE-3171
 URL: https://issues.apache.org/jira/browse/HBASE-3171
 Project: HBase
  Issue Type: Improvement
  Components: client, master, regionserver, zookeeper
Reporter: Jonathan Gray

 Rather than storing the ROOT region location in ZooKeeper, going to ROOT, and 
 reading the META location, we should just store the META location directly in 
 ZooKeeper.
 The purpose of the root region from the bigtable paper was to support 
 multiple meta regions.  Currently, we explicitly only support a single meta 
 region, so the translation from our current code of a single root location to 
 a single meta location will be very simple.  Long-term, it seems reasonable 
 that we could store several meta region locations in ZK.  There's been some 
 discussion in HBASE-1755 about actually moving META into ZK, but I think this 
 jira is a good step towards taking some of the complexity out of how we have 
 to deal with catalog tables everywhere.
 As-is, a new client already requires ZK to get the root location, so this 
 would not change those requirements in any way.
 The primary motivation for this is to simplify things like CatalogTracker.  
 The way we can handle root in that class is really simple but the tracking of 
 meta is difficulty and a bit hacky.  This hack on tracking of the meta 
 location is what caused one of the bugs over in HBASE-3159.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2012-02-02 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199312#comment-13199312
 ] 

Jonathan Gray commented on HBASE-4528:
--

@Mubarek, since it's a performance optimization and new feature, it's not going 
to be committed into the 90/92 branches.  That being said, this patch could be 
backported if someone wanted to use it on a 92 branch (90 might be 
significantly more difficult, not sure).

 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: 4528-trunk-v9.txt, 4528-trunk.txt, 
 HBASE-4528-Trunk-FINAL.patch, appendNoSync5.txt, appendNoSyncPut1.txt, 
 appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, 
 appendNoSyncPut5.txt, appendNoSyncPut6.txt, appendNoSyncPut7.txt, 
 appendNoSyncPut8.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement (MultiGet functionality for increments)

2012-01-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179220#comment-13179220
 ] 

Jonathan Gray commented on HBASE-2947:
--

Not working on it but no reason not to commit that I recall.




 MultiIncrement (MultiGet functionality for increments)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Attachments: HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4752) Don't create an unnecessary LinkedList when evicting from the BlockCache

2011-11-07 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13145693#comment-13145693
 ] 

Jonathan Gray commented on HBASE-4752:
--

Sorry I didn't chime in earlier, been traveling.

I'm actually -1 on this change at the moment because of the introduction of a 
Google class (now the block cache has this external dependency).  This class is 
actually used by other projects outside of HBase, so I'd hate to put in an 
unnecessary dependency.  Is there additional value we get out of using the 
MinMaxPQ?  We save a LinkedList allocation?

As for the change in behavior, I'm not sure I follow.  Seems like nothing 
actually changes?  (whether the PQ is cleared or not doesn't really matter, 
behavior-wise?)

The way I'm reading the code, it seems like we could actually just remove the 
LL completely and leave in place the regular PQ?  CachedBlock takes care of the 
sort order, no?

 Don't create an unnecessary LinkedList when evicting from the BlockCache
 

 Key: HBASE-4752
 URL: https://issues.apache.org/jira/browse/HBASE-4752
 Project: HBase
  Issue Type: Improvement
  Components: performance, regionserver
Affects Versions: 0.90.4
Reporter: Benoit Sigoure
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4752-Don-t-create-an-unnecessary-LinkedList-wh.patch, 
 4752-trunk-v2.txt, 4752-trunk.txt


 When evicting from the BlockCache, the code creates a LinkedList containing 
 every single block sorted by access time.  This list is created from a 
 PriorityQueue.  I don't believe it is necessary, as the PriorityQueue can be 
 used directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4745) LRU Statistics thread should be daemon

2011-11-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144495#comment-13144495
 ] 

Jonathan Gray commented on HBASE-4745:
--

+1

 LRU Statistics thread should be daemon
 --

 Key: HBASE-4745
 URL: https://issues.apache.org/jira/browse/HBASE-4745
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Andrew Purtell
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4745.patch


 Here was from 'HBase 0.92/Hadoop 0.22 test results' discussion on dev@hbase
 {code}
 LRU Statistics #0 prio=10 tid=0x7f4edc7dd800 nid=0x211a waiting
 on condition [0x7f4e631e2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x7f4e88acc968 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
 {code}
 We should make this thread daemon thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction

2011-11-01 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141445#comment-13141445
 ] 

Jonathan Gray commented on HBASE-4717:
--

+1 on this general direction.

We've long talked of special compaction heuristics that would bucketize by time 
in some way (and you could really take advantage of the TimeRangeTracker file 
selection stuff for read perf).  We did as you describe and set a small 
max.size, so once a file reached a certain size, it would never be compacted 
again.  This allowed us to age out the data by keeping old stuff separate 
from new stuff in files.

We were not trying to actually wipe out the data, only separate it, because 
this was mostly a read-modify-write workload that needed access to recent data 
but the old data still needed to be available for user read queries.  It would 
probably be simple to add a check during compaction time of the time range of 
each file and if the max is expired, just to wipe out that file.

 More efficient age-off of old data during major compaction
 --

 Key: HBASE-4717
 URL: https://issues.apache.org/jira/browse/HBASE-4717
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon

 Many applications need to implement efficient age-off of old data. We 
 currently only perform age-off during major compaction by scanning through 
 all of the KVs. Instead, we could implement the following:
 - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store 
 files contain only smaller finite ranges of time.
 - Periodically run an age-off compaction. This compaction would scan the 
 current list of storefiles. Any store file that falls entirely out of the TTL 
 time range would be dropped. Store files completely within the time range 
 would be un-altered. Those crossing the time-range boundary could either be 
 left alone or compacted using the existing compaction code.
 I don't have a design in mind for how exactly this would be implemented, but 
 hope to generate some discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-11-01 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141478#comment-13141478
 ] 

Jonathan Gray commented on HBASE-4298:
--

I think this should be for 0.94 since it's a new feature.  I also think a 
pre-requisite to commit is a unit test.

 Support to drain RS nodes through ZK
 

 Key: HBASE-4298
 URL: https://issues.apache.org/jira/browse/HBASE-4298
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
 Environment: all
Reporter: Aravind Gottipati
Priority: Critical
  Labels: patch
 Fix For: 0.92.0, 0.90.5

 Attachments: 4298-trunk-v2.txt, 90_hbase.patch, trunk_hbase.patch


 HDFS currently has a way to exclude certain datanodes and prevent them from 
 getting new blocks.  HDFS goes one step further and even drains these nodes 
 for you.  This enhancement is a step in that direction.
 The idea is that we mark nodes in zookeeper as draining nodes.  This means 
 that they don't get any more new regions.  These draining nodes look exactly 
 the same as the corresponding nodes in /rs, except they live under /draining.
 Eventually, support for draining them can be added.  I am submitting two 
 patches for review - one for the 0.90 branch and one for trunk (in git).
 Here are the two patches
 0.90 - 
 https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
 trunk - 
 https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
 I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138740#comment-13138740
 ] 

Jonathan Gray commented on HBASE-1744:
--

One more requested change.  Over in HBASE-4658 the map of attributes was added 
to the available APIs in thrift.  Could we add this to the new TScan, TGet, 
etc. structs?

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: 
 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 
 HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, 
 HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, 
 HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, 
 thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138792#comment-13138792
 ] 

Jonathan Gray commented on HBASE-4641:
--

I like v4 much less than the other changes.  My v1 patch makes it so we could 
potentially break something because it's expecting to be able to manipulate the 
conf after construction (an easy assumption to document / test for).  The v4 
patch now takes the conf passed in by reference and modifies it.  It then 
modifies the same conf reference later in Store.  Seems like this could have 
some bad side-effects in the opposite direction.

At this point, I vote for the v1 hack until we make the cache non-static.  As 
long as unit tests still pass.

 Block cache can be mistakenly instantiated on Master
 

 Key: HBASE-4641
 URL: https://issues.apache.org/jira/browse/HBASE-4641
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4641-suggestion-v3.txt, 4641-v4.txt, 
 HBASE-4641-v1.patch, HBASE-4641-v2.patch


 After changes in the block cache instantiation over in HBASE-4422, it looks 
 like the HMaster can now end up with a block cache instantiated.  Not a huge 
 deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138809#comment-13138809
 ] 

Jonathan Gray commented on HBASE-4641:
--

Opened HBASE-4697 to deal with real solution.

 Block cache can be mistakenly instantiated on Master
 

 Key: HBASE-4641
 URL: https://issues.apache.org/jira/browse/HBASE-4641
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4641-suggestion-v3.txt, 4641-v4.txt, 
 HBASE-4641-v1.patch, HBASE-4641-v2.patch


 After changes in the block cache instantiation over in HBASE-4422, it looks 
 like the HMaster can now end up with a block cache instantiated.  Not a huge 
 deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138822#comment-13138822
 ] 

Jonathan Gray commented on HBASE-4532:
--

Please stop doing multiple commits on the same JIRA! :)  I thought we agreed on 
this, or no?

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4687) regionserver may miss zk-heartbeats to master when replaying edits at region open

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138828#comment-13138828
 ] 

Jonathan Gray commented on HBASE-4687:
--

Thanks Prakash!

 regionserver may miss zk-heartbeats to master when replaying edits at region 
 open
 -

 Key: HBASE-4687
 URL: https://issues.apache.org/jira/browse/HBASE-4687
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-4687-regionserver-may-miss-zk-heartbeats-to-ma.patch


 replayRecoveredEdits() should do another reporter.progress() before returning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138831#comment-13138831
 ] 

Jonathan Gray commented on HBASE-4532:
--

I don't think JIRA being open/closed is the issue, it's more multiple commits.

But yeah, as a separate note, looks like there was no final comment and 
resolution after the commit.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13137291#comment-13137291
 ] 

Jonathan Gray commented on HBASE-4658:
--

Dhruba, it is thrift 0.7.0 (or at least it was last time I generated).  If you 
don't have time today I can regenerate Hbase.java and commit this.

Re: HBASE-1744, will this change apply after that goes in?  It seems like this 
change could be added on top of that change but that your current patch is 
based on the current thrift API?

 Put attributes are not exposed via the ThriftServer
 ---

 Key: HBASE-4658
 URL: https://issues.apache.org/jira/browse/HBASE-4658
 Project: HBase
  Issue Type: Bug
  Components: thrift
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: ThriftPutAttributes1.txt


 The Put api also takes in a bunch of arbitrary attributes that an application 
 can use to associate metadata with each put operation. This is not exposed 
 via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13137570#comment-13137570
 ] 

Jonathan Gray commented on HBASE-4528:
--

+1 on adding the log line Ted.  Will do.

 I will try to spend time looking at the unit test tonight.




 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: HBASE-4528-Trunk-FINAL.patch, appendNoSync5.txt, 
 appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, 
 appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, 
 appendNoSyncPut7.txt, appendNoSyncPut8.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13137719#comment-13137719
 ] 

Jonathan Gray commented on HBASE-4528:
--

Is it safe to ignore this close?  Should it be a WARN not DEBUG?  I'm a little 
confused why this is happening in the test.  Is the FS being closed before this 
finishes or what?

 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: 4528-trunk.txt, HBASE-4528-Trunk-FINAL.patch, 
 appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt, 
 appendNoSyncPut6.txt, appendNoSyncPut7.txt, appendNoSyncPut8.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138053#comment-13138053
 ] 

Jonathan Gray commented on HBASE-4641:
--

Stack, that's what I had in v1.  I felt like it was an ugly hack and might have 
an impact on unit tests that modify a conf after hmaster is instantiated.

I can just try that again and run the unit tests to see if they do all pass.




 Block cache can be mistakenly instantiated on Master
 

 Key: HBASE-4641
 URL: https://issues.apache.org/jira/browse/HBASE-4641
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4641-v1.patch, HBASE-4641-v2.patch


 After changes in the block cache instantiation over in HBASE-4422, it looks 
 like the HMaster can now end up with a block cache instantiated.  Not a huge 
 deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138061#comment-13138061
 ] 

Jonathan Gray commented on HBASE-4641:
--

The real fix is for the block cache to be instantiated in HRS and not be static.

This slightly complicates things but is possible.




 Block cache can be mistakenly instantiated on Master
 

 Key: HBASE-4641
 URL: https://issues.apache.org/jira/browse/HBASE-4641
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4641-suggestion-v3.txt, HBASE-4641-v1.patch, 
 HBASE-4641-v2.patch


 After changes in the block cache instantiation over in HBASE-4422, it looks 
 like the HMaster can now end up with a block cache instantiated.  Not a huge 
 deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2011-10-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136383#comment-13136383
 ] 

Jonathan Gray commented on HBASE-4658:
--

How does this relate to HBASE-1744?  That's slated for 0.94, should we just put 
this in 0.92?  And I guess we should ensure that attributes are supported over 
there.

I'm +1 on putting this in 0.92 since it makes it possible to add whatever we 
want without changing the API in 92 minor releases.

 Put attributes are not exposed via the ThriftServer
 ---

 Key: HBASE-4658
 URL: https://issues.apache.org/jira/browse/HBASE-4658
 Project: HBase
  Issue Type: Bug
  Components: thrift
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: ThriftPutAttributes1.txt


 The Put api also takes in a bunch of arbitrary attributes that an application 
 can use to associate metadata with each put operation. This is not exposed 
 via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136412#comment-13136412
 ] 

Jonathan Gray commented on HBASE-4528:
--

Sorry Ted, I'm not clear on what exactly you're pointing out.  Is something 
broken there?

 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: HBASE-4528-Trunk-FINAL.patch, appendNoSync5.txt, 
 appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, 
 appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, 
 appendNoSyncPut7.txt, appendNoSyncPut8.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4447) Allow hbase.version to be passed in as command-line argument

2011-10-23 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133780#comment-13133780
 ] 

Jonathan Gray commented on HBASE-4447:
--

Shouldn't be fixed?  Should be Invalid or some other?

Thanks for all the cleanup stack!

 Allow hbase.version to be passed in as command-line argument
 

 Key: HBASE-4447
 URL: https://issues.apache.org/jira/browse/HBASE-4447
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
 Fix For: 0.92.0

 Attachments: HBASE-4447-0.92.patch


 Currently the build always produces the jars and tarball according to the 
 version baked into the POM.
 When we modify this to allow the version to be passed in as a command-line 
 argument, it can still default to the same behavior, yet give the flexibility 
 for an internal build to tag on own version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4641) Block cache can be mistakenly instantiated on Master

2011-10-21 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132854#comment-13132854
 ] 

Jonathan Gray commented on HBASE-4641:
--

Thanks Ted.  You see any others?

 Block cache can be mistakenly instantiated on Master
 

 Key: HBASE-4641
 URL: https://issues.apache.org/jira/browse/HBASE-4641
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4641-v1.patch, HBASE-4641-v2.patch


 After changes in the block cache instantiation over in HBASE-4422, it looks 
 like the HMaster can now end up with a block cache instantiated.  Not a huge 
 deal but prevents the process from shutting down properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4643) Consider reverting HBASE-451 (change HRI to remove HTD) in 0.92

2011-10-21 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132865#comment-13132865
 ] 

Jonathan Gray commented on HBASE-4643:
--

I've had a few pretty horrible experiences moving an 0.90 cluster to 0.92 so 
far, so I agree that this is definitely the most unbaked part of 0.92.

Now that I've got 92 clusters, I'm going to have to figure out a reverting plan 
for them if we back this out now.  It will also become a barrier between 0.92 
and 0.94 which will make my life difficult as well (since we have been pulling 
94 changes into a local 92 branch).

I'd like to see if Stack's next changes do the trick before abandoning this.

 Consider reverting HBASE-451 (change HRI to remove HTD) in 0.92
 ---

 Key: HBASE-4643
 URL: https://issues.apache.org/jira/browse/HBASE-4643
 Project: HBase
  Issue Type: Brainstorming
Affects Versions: 0.92.0
Reporter: Todd Lipcon
 Attachments: revert.txt


 I've been chatting with some folks recently about this thought: it seems 
 like, if you enumerate the larger changes in 0.92, this is probably the one 
 that is the most destabilizing that hasn't been through a lot of baking 
 yet. You can see this evidenced by the very high number of followup commits 
 it generated: looks like somewhere around 15 of them, plus some bugs still 
 open.
 I've done a patch to revert this and the related followup changes on the 0.92 
 branch. Do we want to consider doing this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-20 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131403#comment-13131403
 ] 

Jonathan Gray commented on HBASE-4536:
--

+1 to v16 for commit to trunk.  You are a good man, Lars.  Well done.  And 
thanks for being patient.

 Allow CF to retain deleted rows
 ---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 4536-v15.txt, 4536-v16.txt


 Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
 of versions.
 However, if a client deletes a row all version older than the delete tomb 
 stone will be remove at the next major compaction (and even at memstore flush 
 - see HBASE-4241).
 There should be a way to retain those version to guard against software error.
 I see two options here:
 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
 versions even past the delete marker.
 #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
 (from a user viewpoint)
 Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-10-20 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131898#comment-13131898
 ] 

Jonathan Gray commented on HBASE-4608:
--

I think the idea is a custom compression where we can do stuff like start the 
HLog with a dictionary of some known repetitive stuff.  It's very similar to 
the delta encoding work.




 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi

 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4636) Refactor catalog MetaReader and MetaEditor so one class only

2011-10-20 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131952#comment-13131952
 ] 

Jonathan Gray commented on HBASE-4636:
--

is this really a noob task?  i'm +1 on revisiting the structure here, but 
shouldn't it be part of the larger CatalogTracker / retry facilities / etc?

 Refactor catalog MetaReader and MetaEditor so one class only
 

 Key: HBASE-4636
 URL: https://issues.apache.org/jira/browse/HBASE-4636
 Project: HBase
  Issue Type: Improvement
Reporter: stack
  Labels: noob

 I suggest we collapse MetaReader and MetaEditor.  Setters are in one class 
 while Getters are in another which is a little disorientating.  The Editor 
 class uses facility from the Reader class to do edits which seems a little 
 off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3581) hbase rpc should send size of response

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130416#comment-13130416
 ] 

Jonathan Gray commented on HBASE-3581:
--

should rename method to getErrorOrLengthSet()?

 hbase rpc should send size of response
 --

 Key: HBASE-3581
 URL: https://issues.apache.org/jira/browse/HBASE-3581
 Project: HBase
  Issue Type: Improvement
Reporter: ryan rawson
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3581-v2.txt, 3581-v3.txt, 3581-v4.txt, 
 HBASE-rpc-response.txt


 The RPC reply from Server-Client does not include the size of the payload, 
 it is framed like so:
 i32 callId
 byte errorFlag
 byte[] data
 The data segment would contain enough info about how big the response is so 
 that it could be decoded by a writable reader.
 This makes it difficult to write buffering clients, who might read the entire 
 'data' then pass it to a decoder. While less memory efficient, if you want to 
 easily write block read clients (eg: nio) it would be necessary to send the 
 size along so that the client could snarf into a local buf.
 The new proposal is:
 i32 callId
 i32 size
 byte errorFlag
 byte[] data
 the size being sizeof(data) + sizeof(errorFlag).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4620) I broke the build when I submitted HBASE-3581 (Send length of the rpc response)

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130832#comment-13130832
 ] 

Jonathan Gray commented on HBASE-4620:
--

stack, doesn't the method name imply the existing behavior?  should change the 
method name?

 I broke the build when I submitted HBASE-3581 (Send length of the rpc 
 response)
 ---

 Key: HBASE-4620
 URL: https://issues.apache.org/jira/browse/HBASE-4620
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4620.txt


 Thanks to Ted, Ram and Gao for figuring my messup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4620) I broke the build when I submitted HBASE-3581 (Send length of the rpc response)

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130833#comment-13130833
 ] 

Jonathan Gray commented on HBASE-4620:
--

or this is meant to combine the two so the | is actually the right behavior for 
'and'?  hmm

 I broke the build when I submitted HBASE-3581 (Send length of the rpc 
 response)
 ---

 Key: HBASE-4620
 URL: https://issues.apache.org/jira/browse/HBASE-4620
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4620.txt


 Thanks to Ted, Ram and Gao for figuring my messup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131188#comment-13131188
 ] 

Jonathan Gray commented on HBASE-4536:
--

I'm at +0.5

Add just a bit more high-level, config-level doc somewhere and I'm a strong 
+1...

:)

 Allow CF to retain deleted rows
 ---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 4536-v15.txt


 Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
 of versions.
 However, if a client deletes a row all version older than the delete tomb 
 stone will be remove at the next major compaction (and even at memstore flush 
 - see HBASE-4241).
 There should be a way to retain those version to guard against software error.
 I see two options here:
 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
 versions even past the delete marker.
 #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
 (from a user viewpoint)
 Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4630) If you shutdown all RS an active master is never able to recover when RS come back online

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131210#comment-13131210
 ] 

Jonathan Gray commented on HBASE-4630:
--

The stuff I'm seeing in the logs is different but it's probably the same or a 
related issue.  I'm going to try and dig on this and will figure out whether to 
close this as a dupe or not.  Thanks for the pointer, Ted.

 If you shutdown all RS an active master is never able to recover when RS come 
 back online
 -

 Key: HBASE-4630
 URL: https://issues.apache.org/jira/browse/HBASE-4630
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
 Fix For: 0.92.1


 I've been doing some isolated benchmarking of a single RS and can repeatedly 
 trigger some craziness in the master if I shutdown the RS.  It is never able 
 to recover after bringing RSs back online.  I seem to see different behavior 
 across different branches / revisions of the 92 branch, but there does seem 
 to be an issue in several of them.
 Putting against 0.92.1 so we don't hold up the release of 0.92.  Should not 
 be a blocker.
 Working on a unit test now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131261#comment-13131261
 ] 

Jonathan Gray commented on HBASE-1183:
--

Wow, really got me thinking back.  I honestly don't remember exactly why.

We convert them to BigInteger so we can do:  (stop - start) / numsplits = 
interval

Something related to signed/unsigned?  Reading the code it does seem okay.  
Good thing I didn't write a unit test.

Are you seeing that it's broken in some way?  I can spend a little more time 
looking at it if necessary.

 New MR splitting algorithm and other new features need a way to split a key 
 range in N chunks
 -

 Key: HBASE-1183
 URL: https://issues.apache.org/jira/browse/HBASE-1183
 Project: HBase
  Issue Type: Improvement
  Components: util
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.20.0

 Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, 
 hbase-1183-v3.patch, hbase-1183-v4.patch


 For HBASE-1172 and other functionality coming soon, we need to be able to 
 take a [start,stop) range and divide it into chunks.
 For example, we have 10 regions but want to run 30 maps.  We need to divide 
 each region into three key ranges for the start/stop of each scanner.
 Implementing using java.math.BigInteger
 Will also include a couple additional helpers in Bytes to make life easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131264#comment-13131264
 ] 

Jonathan Gray commented on HBASE-1183:
--

To clarify, I meant that the code seems like you don't need to prepend the 
{1,0} but I have some vague memory of needing it.

 New MR splitting algorithm and other new features need a way to split a key 
 range in N chunks
 -

 Key: HBASE-1183
 URL: https://issues.apache.org/jira/browse/HBASE-1183
 Project: HBase
  Issue Type: Improvement
  Components: util
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.20.0

 Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, 
 hbase-1183-v3.patch, hbase-1183-v4.patch


 For HBASE-1172 and other functionality coming soon, we need to be able to 
 take a [start,stop) range and divide it into chunks.
 For example, we have 10 regions but want to run 30 maps.  We need to divide 
 each region into three key ranges for the start/stop of each scanner.
 Implementing using java.math.BigInteger
 Will also include a couple additional helpers in Bytes to make life easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4626) Filters unnecessarily copy byte arrays...

2011-10-19 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131378#comment-13131378
 ] 

Jonathan Gray commented on HBASE-4626:
--

I'm okay with this in 92 but would prefer it goes to 94.  Put the perf in the 
next release so we release it soon.

 Filters unnecessarily copy byte arrays...
 -

 Key: HBASE-4626
 URL: https://issues.apache.org/jira/browse/HBASE-4626
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4626-v2.txt, 4626.txt


 Just looked at SingleCol and ValueFilter... And on every column compared they 
 create a copy of the column and/or value portion of the KV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4612) Allow ColumnPrefixFilter to support multiple prefixes

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129955#comment-13129955
 ] 

Jonathan Gray commented on HBASE-4612:
--

Hey Eran.  Thanks for the contribution!  A few comments..

- There's no explanation of the behavior anywhere.  In the constructors and 
addPrefix() methods, you should document that this creates an OR condition 
across all of the prefixes, correct?
- No need to instantiate a new comparator all the time (use 
Bytes.BYTES_COMPARATOR)
- Something seems odd when you keep adding to the end of a List and then sort.  
How about a TreeSet?  You can easily ignore dupes that way.
- There's no input verification so, for example, you could pass a null to the 
constructor or an empty byte[][] and have some strange behavior.  Like it will 
instantiate okay but then you'll get server-side NPEs or IOOB.
- this.prefixes.size() == 0 - this.prefixes.isEmpty()
- your comment at the top of filterColumn, i wouldn't exactly call it a 
workaround, but it's a good comment.  looking at the logic, it seems like 
correct behavior would be that it can be called with current == size() but it 
would be a bug if current  size(), right?  should you add an assert or throw 
an exception?

 Allow ColumnPrefixFilter to support multiple prefixes
 -

 Key: HBASE-4612
 URL: https://issues.apache.org/jira/browse/HBASE-4612
 Project: HBase
  Issue Type: Improvement
  Components: filters
Affects Versions: 0.90.4
Reporter: Eran Kutner
Priority: Minor
 Attachments: HBASE-4612-0.90.patch


 When having a lot of columns grouped by name I've found that it would be very 
 useful to be able to scan them using multiple prefixes, allowing to fetch 
 specific groups in one scan, without fetching the entire row. This is 
 impossible to achieve using a FilterList, so I've added such support to the 
 existing ColmnPrefixFilter while keeping backward compatibility.
 The attached patch is based on 0.90.4, I noticed that the 0.92 branch has a 
 new method to support instantiating filters using Thrift. I'm not sure how 
 the serialization works there so I didn't implement that, but the rest of my 
 code should work in 0.92 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130121#comment-13130121
 ] 

Jonathan Gray commented on HBASE-4611:
--

Should we change the display name of the reviews.facebook.net account from John 
Sichi? :)

 Add support for Phabricator/Differential as an alternative code review tool
 ---

 Key: HBASE-4611
 URL: https://issues.apache.org/jira/browse/HBASE-4611
 Project: HBase
  Issue Type: Task
Reporter: Jonathan Gray
 Attachments: D21.1.patch, D21.1.patch


 From http://phabricator.org/ : Phabricator is a open source collection of 
 web applications which make it easier to write, review, and share source 
 code. It is currently available as an early release. Phabricator was 
 developed at Facebook.
 It's open source so pretty much anyone could host an instance of this 
 software.
 To begin with, there will be a public-facing instance located at 
 http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
 http://osuosl.org).
 We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
 support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130140#comment-13130140
 ] 

Jonathan Gray commented on HBASE-4611:
--

Looks like you're one step ahead on the a tags, thanks!

 Add support for Phabricator/Differential as an alternative code review tool
 ---

 Key: HBASE-4611
 URL: https://issues.apache.org/jira/browse/HBASE-4611
 Project: HBase
  Issue Type: Task
Reporter: Jonathan Gray
 Attachments: D21.1.patch, D21.1.patch


 From http://phabricator.org/ : Phabricator is a open source collection of 
 web applications which make it easier to write, review, and share source 
 code. It is currently available as an early release. Phabricator was 
 developed at Facebook.
 It's open source so pretty much anyone could host an instance of this 
 software.
 To begin with, there will be a public-facing instance located at 
 http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
 http://osuosl.org).
 We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
 support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-18 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130139#comment-13130139
 ] 

Jonathan Gray commented on HBASE-4611:
--

@Marek, we could file an INFRA task.  Or we could create a new account?  Also, 
there seems to be something with URL translation (JIRA is treating the a tag 
as escaped so actually showing it, and then converting straight text URLs to 
hyperlinks).

 Add support for Phabricator/Differential as an alternative code review tool
 ---

 Key: HBASE-4611
 URL: https://issues.apache.org/jira/browse/HBASE-4611
 Project: HBase
  Issue Type: Task
Reporter: Jonathan Gray
 Attachments: D21.1.patch, D21.1.patch


 From http://phabricator.org/ : Phabricator is a open source collection of 
 web applications which make it easier to write, review, and share source 
 code. It is currently available as an early release. Phabricator was 
 developed at Facebook.
 It's open source so pretty much anyone could host an instance of this 
 software.
 To begin with, there will be a public-facing instance located at 
 http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
 http://osuosl.org).
 We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
 support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4603) Uneeded sleep time for tests in hbase.master.ServerManager#waitForRegionServers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129295#comment-13129295
 ] 

Jonathan Gray commented on HBASE-4603:
--

There was a nice param in HBASE-3380 that is in 90 but not 92/trunk.  I'm going 
to see if we can get that brought into the active branches, then we can just 
set the maxServers config to the # of RS set to start, and then it will just 
work instantly w/o having to wait for this interval/sleep loop.

 Uneeded sleep time for tests in 
 hbase.master.ServerManager#waitForRegionServers
 ---

 Key: HBASE-4603
 URL: https://issues.apache.org/jira/browse/HBASE-4603
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.92.0
 Environment: all.
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 20111017_4603_MiniHBaseCluster.patch


 This functions waits for at least 2 times 
 hbase.master.wait.on.regionservers.interval, defaulted at 3 seconds, i.e. 6 
 seconds for every mini hbase cluster starts.
 In the context of a mini cluster, it's not useful, as the regions servers are 
 created locally.
 Changing this to a lower value such as 100ms gives 5.8 second per HBase 
 cluser start. It should lower the build time on the apache server by more 
 than 8%.
 Beeing more aggressive (removing all the wait time) could be possible as 
 well. To be studied later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3380) Master failover can split logs of live servers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129293#comment-13129293
 ] 

Jonathan Gray commented on HBASE-3380:
--

So it looks like we thought we'd do a proper fix for 0.92, but do we have one?  
There's some good config params that were committed as part of this JIRA into 
0.90 that are now not available in 0.92.

Should this be committed to 0.92 and trunk?  I'd like to at least bring these 
config params over since they are pretty nice (and will make a more elegant 
solution to stuff like HBASE-4603).

 Master failover can split logs of live servers
 --

 Key: HBASE-3380
 URL: https://issues.apache.org/jira/browse/HBASE-3380
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jonathan Gray
Priority: Blocker
 Fix For: 0.90.0

 Attachments: HBASE-3380-v1.patch, HBASE-3380-v2.patch


 The reason why TestMasterFailover fails is that when it does the master 
 failover, the new master doesn't wait long enough for all region servers to 
 checkin so it goes ahead and split logs... which doesn't work because of the 
 way lease timeouts work:
 {noformat}
 2010-12-21 07:30:36,977 DEBUG [Master:0;vesta.apache.org:33170] 
 wal.HLogSplitter(256): Splitting hlog 1 of 1:
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204,
  length=0
 2010-12-21 07:30:36,977 DEBUG [WriterThread-1] 
 wal.HLogSplitter$WriterThread(619): Writer thread 
 Thread[WriterThread-1,5,main]: starting
 2010-12-21 07:30:36,977 DEBUG [WriterThread-2] 
 wal.HLogSplitter$WriterThread(619): Writer thread 
 Thread[WriterThread-2,5,main]: starting
 2010-12-21 07:30:36,977 INFO  [Master:0;vesta.apache.org:33170] 
 util.FSUtils(625): Recovering file
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
 2010-12-21 07:30:36,979 WARN  [IPC Server handler 8 on 49187] 
 namenode.FSNamesystem(1122): DIR* NameSystem.startFile:
  failed to create file 
 /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
  for
  DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
 because this file is already being created by
  DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
 127.0.0.1
 ...
 2010-12-21 07:33:44,332 WARN  [Master:0;vesta.apache.org:33170] 
 util.FSUtils(644): Waited 187354ms for lease recovery on
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204:
  org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file
  
 /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
  for DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
 because this file is already
  being created by 
 DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
 127.0.0.1
 {noformat}
 I think that we should always check in ZK the number of live region servers 
 before waiting for them to check in, this way we know how many we should 
 expect during failover. There's also a case where we still want to timeout, 
 since RS can die during that time, but we should wait a bit longer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3380) Master failover can split logs of live servers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129339#comment-13129339
 ] 

Jonathan Gray commented on HBASE-3380:
--

What's the best practice here?  Should I just commit this to 92 and trunk and 
make a note here?  Should I open a new jira since this is so old?

(Thanks for input guys)

 Master failover can split logs of live servers
 --

 Key: HBASE-3380
 URL: https://issues.apache.org/jira/browse/HBASE-3380
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jonathan Gray
Priority: Blocker
 Fix For: 0.90.0

 Attachments: HBASE-3380-v1.patch, HBASE-3380-v2.patch


 The reason why TestMasterFailover fails is that when it does the master 
 failover, the new master doesn't wait long enough for all region servers to 
 checkin so it goes ahead and split logs... which doesn't work because of the 
 way lease timeouts work:
 {noformat}
 2010-12-21 07:30:36,977 DEBUG [Master:0;vesta.apache.org:33170] 
 wal.HLogSplitter(256): Splitting hlog 1 of 1:
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204,
  length=0
 2010-12-21 07:30:36,977 DEBUG [WriterThread-1] 
 wal.HLogSplitter$WriterThread(619): Writer thread 
 Thread[WriterThread-1,5,main]: starting
 2010-12-21 07:30:36,977 DEBUG [WriterThread-2] 
 wal.HLogSplitter$WriterThread(619): Writer thread 
 Thread[WriterThread-2,5,main]: starting
 2010-12-21 07:30:36,977 INFO  [Master:0;vesta.apache.org:33170] 
 util.FSUtils(625): Recovering file
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
 2010-12-21 07:30:36,979 WARN  [IPC Server handler 8 on 49187] 
 namenode.FSNamesystem(1122): DIR* NameSystem.startFile:
  failed to create file 
 /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
  for
  DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
 because this file is already being created by
  DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
 127.0.0.1
 ...
 2010-12-21 07:33:44,332 WARN  [Master:0;vesta.apache.org:33170] 
 util.FSUtils(644): Waited 187354ms for lease recovery on
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204:
  org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file
  
 /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
  for DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
 because this file is already
  being created by 
 DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
 127.0.0.1
 {noformat}
 I think that we should always check in ZK the number of live region servers 
 before waiting for them to check in, this way we know how many we should 
 expect during failover. There's also a case where we still want to timeout, 
 since RS can die during that time, but we should wait a bit longer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3380) Master failover can split logs of live servers

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129344#comment-13129344
 ] 

Jonathan Gray commented on HBASE-3380:
--

Heartbeats still exist so I'm not sure much is different in 92 since we tackled 
this, right?

I will open a new JIRA though.

 Master failover can split logs of live servers
 --

 Key: HBASE-3380
 URL: https://issues.apache.org/jira/browse/HBASE-3380
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jonathan Gray
Priority: Blocker
 Fix For: 0.90.0

 Attachments: HBASE-3380-v1.patch, HBASE-3380-v2.patch


 The reason why TestMasterFailover fails is that when it does the master 
 failover, the new master doesn't wait long enough for all region servers to 
 checkin so it goes ahead and split logs... which doesn't work because of the 
 way lease timeouts work:
 {noformat}
 2010-12-21 07:30:36,977 DEBUG [Master:0;vesta.apache.org:33170] 
 wal.HLogSplitter(256): Splitting hlog 1 of 1:
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204,
  length=0
 2010-12-21 07:30:36,977 DEBUG [WriterThread-1] 
 wal.HLogSplitter$WriterThread(619): Writer thread 
 Thread[WriterThread-1,5,main]: starting
 2010-12-21 07:30:36,977 DEBUG [WriterThread-2] 
 wal.HLogSplitter$WriterThread(619): Writer thread 
 Thread[WriterThread-2,5,main]: starting
 2010-12-21 07:30:36,977 INFO  [Master:0;vesta.apache.org:33170] 
 util.FSUtils(625): Recovering file
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
 2010-12-21 07:30:36,979 WARN  [IPC Server handler 8 on 49187] 
 namenode.FSNamesystem(1122): DIR* NameSystem.startFile:
  failed to create file 
 /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
  for
  DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
 because this file is already being created by
  DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
 127.0.0.1
 ...
 2010-12-21 07:33:44,332 WARN  [Master:0;vesta.apache.org:33170] 
 util.FSUtils(644): Waited 187354ms for lease recovery on
  
 hdfs://localhost:49187/user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204:
  org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file
  
 /user/hudson/.logs/vesta.apache.org,38743,1292916616340/vesta.apache.org%3A38743.1292916617204
  for DFSClient_hb_m_vesta.apache.org:33170_1292916630791 on client 127.0.0.1, 
 because this file is already
  being created by 
 DFSClient_hb_rs_vesta.apache.org,38743,1292916616340_1292916617166 on 
 127.0.0.1
 {noformat}
 I think that we should always check in ZK the number of live region servers 
 before waiting for them to check in, this way we know how many we should 
 expect during failover. There's also a case where we still want to timeout, 
 since RS can die during that time, but we should wait a bit longer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-10-17 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129397#comment-13129397
 ] 

Jonathan Gray commented on HBASE-4611:
--

In addition to being a (better) code review tool, the Phabricator suite also 
includes stuff like repo/revision browsing, nice command-line tools, pastebin, 
etc. which should be available for the HBase repos.

 Add support for Phabricator/Differential as an alternative code review tool
 ---

 Key: HBASE-4611
 URL: https://issues.apache.org/jira/browse/HBASE-4611
 Project: HBase
  Issue Type: Task
Reporter: Jonathan Gray

 From http://phabricator.org/ : Phabricator is a open source collection of 
 web applications which make it easier to write, review, and share source 
 code. It is currently available as an early release. Phabricator was 
 developed at Facebook.
 It's open source so pretty much anyone could host an instance of this 
 software.
 To begin with, there will be a public-facing instance located at 
 http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
 http://osuosl.org).
 We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
 support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-15 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128317#comment-13128317
 ] 

Jonathan Gray commented on HBASE-4536:
--

bq. I think this new feature should not be the default behavior.

+1

 Allow CF to retain deleted rows
 ---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


 Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
 of versions.
 However, if a client deletes a row all version older than the delete tomb 
 stone will be remove at the next major compaction (and even at memstore flush 
 - see HBASE-4241).
 There should be a way to retain those version to guard against software error.
 I see two options here:
 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
 versions even past the delete marker.
 #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
 (from a user viewpoint)
 Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4591) TTL for old HLogs should be calculated from last modification time.

2011-10-14 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127859#comment-13127859
 ] 

Jonathan Gray commented on HBASE-4591:
--

+1

 TTL for old HLogs should be calculated from last modification time.
 ---

 Key: HBASE-4591
 URL: https://issues.apache.org/jira/browse/HBASE-4591
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.89.20100621
Reporter: Madhuwanti Vaidya
Assignee: Madhuwanti Vaidya
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4593) Design and document the official procedure for posting patches, commits, commit messages, etc. to smooth process and make integration with tools easier

2011-10-14 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128012#comment-13128012
 ] 

Jonathan Gray commented on HBASE-4593:
--

BTW, once we nail down the formatting and everything, I will toss reposync up 
on a github repo or something.

 Design and document the official procedure for posting patches, commits, 
 commit messages, etc. to smooth process and make integration with tools easier
 ---

 Key: HBASE-4593
 URL: https://issues.apache.org/jira/browse/HBASE-4593
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Jonathan Gray

 I have been building a tool (currently called reposync) to help me keep the 
 internal FB hbase-92-based branch up-to-date with the public branches.
 Various inconsistencies in our process has made it difficult to automate a 
 lot of this stuff.
 I'd like to work with everyone to come up with the official best practices 
 and stick to it.
 I welcome all suggestions.  Among some of the things I'd like to nail down:
 - Commit message format
 - Best practice and commit message format for multiple commits
 - Multiple commits per jira vs. jira per commit, what are the exceptions and 
 when
 - Affects vs. Fix versions
 - Potential usage of [tags] in commit messages for things like book, scripts, 
 shell... maybe even whatever is in the components field?
 - Increased usage of JIRA tags or labels to mark exactly which repos a JIRA 
 has been committed to (potentially even internal repos?  ways for a tool to 
 keep track in JIRA?)
 We also need to be more strict about some things if we want to follow Apache 
 guidelines.  For example, all final versions of a patch must be attached to 
 JIRA so that the author properly assigns it to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4558) Refactor TestOpenedRegionHandler and TestOpenRegionHandler.

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126372#comment-13126372
 ] 

Jonathan Gray commented on HBASE-4558:
--

Did this break the build?  TestMasterFailover is not compiling for me.

 Refactor TestOpenedRegionHandler and TestOpenRegionHandler.
 ---

 Key: HBASE-4558
 URL: https://issues.apache.org/jira/browse/HBASE-4558
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4558_1.patch, HBASE-4558_2.patch, 
 HBASE-4558_3.patch


 This is an improvement task taken up to refactor TestOpenedRegionandler and 
 TestOpenRegionHandler so that MockServer and MockRegionServerServices can be 
 accessed from a common utility package.
 If we do this then one of the testcases in TestOpenedRegionHandler need not 
 start up a cluster and also moving it into a common package will help in 
 mocking the server for future testcases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4558) Refactor TestOpenedRegionHandler and TestOpenRegionHandler.

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126373#comment-13126373
 ] 

Jonathan Gray commented on HBASE-4558:
--

-  metaRegion, regionServer);
+  metaRegion, regionServer.getServerName());

?

 Refactor TestOpenedRegionHandler and TestOpenRegionHandler.
 ---

 Key: HBASE-4558
 URL: https://issues.apache.org/jira/browse/HBASE-4558
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4558_1.patch, HBASE-4558_2.patch, 
 HBASE-4558_3.patch


 This is an improvement task taken up to refactor TestOpenedRegionandler and 
 TestOpenRegionHandler so that MockServer and MockRegionServerServices can be 
 accessed from a common utility package.
 If we do this then one of the testcases in TestOpenedRegionHandler need not 
 start up a cluster and also moving it into a common package will help in 
 mocking the server for future testcases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126854#comment-13126854
 ] 

Jonathan Gray commented on HBASE-4459:
--

- Why is Queue added within the scope of this JIRA?  Seems unrelated.

- Can you remove the unnecessary import re-org at the top?

- Can we have a unit test which shows the backwards compatibility of this?

Thanks for working on this Ram.

 HbaseObjectWritable code is a byte, we will eventually run out of codes
 ---

 Key: HBASE-4459
 URL: https://issues.apache.org/jira/browse/HBASE-4459
 Project: HBase
  Issue Type: Bug
  Components: io
Reporter: Jonathan Gray
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4459.txt


 There are about 90 classes/codes in HbaseObjectWritable currently and 
 Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
 not break compatibility might want to leave a gap before using codes and 
 that's difficult in such limited space.
 Eventually we should get rid of this pattern that makes compatibility 
 difficult (better client/server protocol handshake) but we should probably at 
 least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127018#comment-13127018
 ] 

Jonathan Gray commented on HBASE-3417:
--

Stack, I was going to open a new JIRA, but it is the exact same issue and a 
nearly identical patch (primary difference is pre/post hfile v2).  It was just 
incorrect to close this following commit of hfile v2 which was unrelated to 
this bug.  Nothing was ever committed under this JIRA so just reopened with an 
updated patch.

I think things get confusing when there is more than one commit per branch per 
jira.  We should probably ban that practice.  Or at least institute some kind 
of standardized commit message (HBASE-3417, HBASE-3417-B, HBASE-3417-C, etc) or 
some such thing.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
 HBASE-3417-v2.patch, HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127051#comment-13127051
 ] 

Jonathan Gray commented on HBASE-3417:
--

I didn't mark as incompatible but it is only one-way compatible.

There is actually a very trivial change that can be made in the 0.90 branch (or 
any other branches) to make this change compatible in all directions.  Just 
need to update the REF_NAME_PARSER regex to be what it is in this change 
(tolerant of [a-f] in addition to digits).  That's it.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
 HBASE-3417-v2.patch, HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127054#comment-13127054
 ] 

Jonathan Gray commented on HBASE-3417:
--

In StoreFile.java:
{code}
   private static final Pattern REF_NAME_PARSER =
-Pattern.compile(^(\\d+)(?:\\.(.+))?$);
+Pattern.compile(^([0-9a-f]+)(?:\\.(.+))?$);
{code}

If you ever need to go backwards from 92 to a previous version.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
 HBASE-3417-v2.patch, HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127059#comment-13127059
 ] 

Jonathan Gray commented on HBASE-4469:
--

Liyin, can you post the final patch to this JIRA?  I will commit.  Thanks!

 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127071#comment-13127071
 ] 

Jonathan Gray commented on HBASE-4469:
--

Thanks Liyin.  Unfortunately because the RB integration isn't very tight, to 
follow Apache protocol, you need to attach the patch to the JIRA and select the 
radio button that assigns it to apache.

This also helps to ensure that there's no confusion about which version was 
committed and that we don't have a hard dependency on RB in any way.

It'll all be second nature before you know it :)

 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-4469_1.patch


 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127089#comment-13127089
 ] 

Jonathan Gray commented on HBASE-4469:
--

What is the protocol now?  This needs to go into the fb-89 branch, so do I keep 
this JIRA open until that happens, or should we just add some fb-89-pending tag 
or something?

 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: HBASE-4469_1.patch


 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127090#comment-13127090
 ] 

Jonathan Gray commented on HBASE-4469:
--

(i'm not putting in 92 branch because this is feature)

 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: HBASE-4469_1.patch


 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-10-13 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127122#comment-13127122
 ] 

Jonathan Gray commented on HBASE-4335:
--

@LarsH, in the future, please have your svn commit message be in the same 
format as the CHANGES.txt update (ie. HBASE-  The title description (author 
[via committer])

 Splits can create temporary holes in .META. that confuse clients and 
 regionservers
 --

 Key: HBASE-4335
 URL: https://issues.apache.org/jira/browse/HBASE-4335
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: Joe Pallas
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4335-v2.txt, 4335-v3.txt, 4335-v4.txt, 4335-v5.txt, 
 4335.txt


 When a SplitTransaction is performed, three updates are done to .META.:
 1. The parent region is marked as splitting (and hence offline)
 2. The first daughter region is added (same start key as parent)
 3. The second daughter region is added (split key is start key)
 (later, the original parent region is deleted, but that's not important to 
 this discussion)
 Steps 2 and 3 are actually done concurrently by 
 SplitTransaction.DaughterOpener threads.  While the master is notified when a 
 split is complete, the only visibility that clients have is whether the 
 daughter regions have appeared in .META.
 If the second daughter is added to .META. first, then .META. will contain the 
 (offline) parent region followed by the second daughter region.  If the 
 client looks up a key that is greater than (or equal to) the split, the 
 client will find the second daughter region and use it.  If the key is less 
 than the split key, the client will find the parent region and see that it is 
 offline, triggering a retry.
 If the first daughter is added to .META. before the second daughter, there is 
 a window during which .META. has a hole: the first daughter effectively hides 
 the parent region (same start key), but there is no entry for the second 
 daughter.  A region lookup will find the first daughter for all keys in the 
 parent's range, but the first daughter does not include keys at or beyond the 
 split key.
 See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
 suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new en

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126141#comment-13126141
 ] 

Jonathan Gray commented on HBASE-4102:
--

I think unifying Put and Append is not support important.  It would be good to 
unify Increment and Append, maybe even CheckAndPut/Delete?  A generic atomic op 
thing.

For the attributes, I think we just need a convention for system attributes, 
for example, they are preceded by an _ underscore.  And then we can put all the 
used attributes into HConstants for easy tracking.

Let's open another JIRA to integrate RWCC w/ Append and possibly Increment as 
well.  We can discuss there.

 atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
 current value then adds the bytes offered by the client to the tail and 
 writes out a new entry
 ---

 Key: HBASE-4102
 URL: https://issues.apache.org/jira/browse/HBASE-4102
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 4102-v1.txt, 4102.txt


 Its come up a few times that clients want to add to an existing cell rather 
 than make a new cell each time.  At our place, the frontend keeps a list of 
 urls a user has visited -- their md5s -- and updates it as user progresses.  
 Rather than read, modify client-side, then write new value back to hbase, it 
 would be sweet if could do it all in one operation in hbase server.  TSDB 
 aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
 metric, it would rather have a KV for an interval an in this KV have a value 
 that is all the metrics for the period.
 It could be done as a coprocessor but this feels more like a fundamental 
 feature.
 Benoît suggests that atomicAppend take a flag to indicate whether or not the 
 client wants to see the resulting cell; often a client won't want to see the 
 result and in this case, why pay the price formulating and delivering a 
 response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126175#comment-13126175
 ] 

Jonathan Gray commented on HBASE-4469:
--

@stack, yeah, this version only work if you have rowcol blooms enabled.  The 
generic version is going to be implemented over in HBASE-4532.

 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126224#comment-13126224
 ] 

Jonathan Gray commented on HBASE-4583:
--

We likely won't be able to do in-place modifications or direct KV removal from 
MemStore.  A simple way would be to also introduce a delete marker that removes 
the previous value, but the marker will have the rwcc of the new edit, so 
you'll have the right consistency.

This will lead to a build up of unnecessary KVs in the MemStore.  Periodically 
cleaning that up would be possible but unnecessarily complex I think.

Another option would be to remove the previous KVs after you roll rwcc forward 
and release the row lock, before dropping the region-level lock.  Should 
definitely be possible.  Will obviously require a remangling of upsert but it's 
kinda dirty anyways.

 Integrate RWCC with Append and Increment operations
 ---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.94.0


 Currently Increment and Append operations do not work with RWCC and hence a 
 client could see the results of multiple such operation mixed in the same 
 Get/Scan.
 The semantics might be a bit more interesting here as upsert adds and removes 
 to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126282#comment-13126282
 ] 

Jonathan Gray commented on HBASE-4489:
--

Historically ASCII has proven a bad choice in key design.  If it's always fixed 
length, it's less of a big deal and really does come down to space savings vs. 
readability.  In many applications, row keys are composite keys made up of many 
different things.  Often times, the key may be preceded by some fixed-length 
random hash of some sort.

I almost always want to be building these composite keys from fixed-length 
binary ints/longs and such, rather than fixed-length ascii characters.

If we are talking a straightforward key-val situation with a string-like key, 
then the usability of ASCII would make sense.

 Better key splitting in RegionSplitter
 --

 Key: HBASE-4489
 URL: https://issues.apache.org/jira/browse/HBASE-4489
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-4489-branch0.90-v1.patch, 
 HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, 
 HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, 
 HBASE-4489-trunk-v3.patch


 The RegionSplitter utility allows users to create a pre-split table from the 
 command line or do a rolling split on an existing table. It supports 
 pluggable split algorithms that implement the SplitAlgorithm interface. The 
 only/default SplitAlgorithm is one that assumes keys fall in the range from 
 ASCII string  to ASCII string 7FFF. This is not a sane 
 default, and seems useless to most users. Users are likely to be surprised by 
 the fact that all the region splits occur in in the byte range of ASCII 
 characters.
 A better default split algorithm would be one that evenly divides the space 
 of all bytes, which is what this patch does. Making a table with five regions 
 would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
 \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126286#comment-13126286
 ] 

Jonathan Gray commented on HBASE-4410:
--

My comeback is that Lars is right and I f-ed it up.  I was supposed to make a 
new patch but forgot about this.  I was a bit angry I came up with such a nice 
elegant solution that was fundamentally broken.  ;)

Will try to get to this next week.

 FilterList.filterKeyValue can return suboptimal ReturnCodes
 ---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4410-v1.patch


 FilterList.filterKeyValue does not always return the most optimal ReturnCode 
 in both the AND and OR conditions.
 For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
 the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
 SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
 ReturnCode from F2.
 For AND conditions, we can always pick the *most restrictive* return code.
 For OR conditions, we must always pick the *least restrictive* return code.
 This JIRA is to review the FilterList.filterKeyValue() method to try and make 
 it more optimal and to add a new unit test which verifies the correct 
 behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1621) merge tool should work on online cluster, but disabled table

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126288#comment-13126288
 ] 

Jonathan Gray commented on HBASE-1621:
--

Punt to 0.92.1 or 0.94.0?

 merge tool should work on online cluster, but disabled table
 

 Key: HBASE-1621
 URL: https://issues.apache.org/jira/browse/HBASE-1621
 Project: HBase
  Issue Type: Bug
Reporter: ryan rawson
Assignee: stack
 Fix For: 0.92.0

 Attachments: 1621-trunk.txt, HBASE-1621-v2.patch, HBASE-1621.patch, 
 hbase-onlinemerge.patch, online_merge.rb


 taking down the entire cluster to merge 2 regions is a pain, i dont see why 
 the table or regions specifically couldnt be taken offline, then merged then 
 brought back up.
 this might need a new API to the regionservers so they can take direction 
 from not just the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-10-12 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126332#comment-13126332
 ] 

Jonathan Gray commented on HBASE-4459:
--

I'm fine with pulling into 0.92 since it doesn't break any compatibility.

 HbaseObjectWritable code is a byte, we will eventually run out of codes
 ---

 Key: HBASE-4459
 URL: https://issues.apache.org/jira/browse/HBASE-4459
 Project: HBase
  Issue Type: Bug
  Components: io
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0


 There are about 90 classes/codes in HbaseObjectWritable currently and 
 Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
 not break compatibility might want to leave a gap before using codes and 
 that's difficult in such limited space.
 Eventually we should get rid of this pattern that makes compatibility 
 difficult (better client/server protocol handshake) but we should probably at 
 least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new en

2011-10-11 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125225#comment-13125225
 ] 

Jonathan Gray commented on HBASE-4102:
--

This is really nice Lars.  I'd love to see integration with RWCC and to somehow 
unify the code with Increment.  But I'm okay with committing this and filing a 
follow-up JIRA.

I'm also going to backport this into my local 92 branch but I think it should 
only be committed to trunk.  Let's put all the polish on before putting it in 
an official release.

Nice work!

 atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
 current value then adds the bytes offered by the client to the tail and 
 writes out a new entry
 ---

 Key: HBASE-4102
 URL: https://issues.apache.org/jira/browse/HBASE-4102
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl
 Attachments: 4102-v1.txt, 4102.txt


 Its come up a few times that clients want to add to an existing cell rather 
 than make a new cell each time.  At our place, the frontend keeps a list of 
 urls a user has visited -- their md5s -- and updates it as user progresses.  
 Rather than read, modify client-side, then write new value back to hbase, it 
 would be sweet if could do it all in one operation in hbase server.  TSDB 
 aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
 metric, it would rather have a KV for an interval an in this KV have a value 
 that is all the metrics for the period.
 It could be done as a coprocessor but this feels more like a fundamental 
 feature.
 Benoît suggests that atomicAppend take a flag to indicate whether or not the 
 client wants to see the resulting cell; often a client won't want to see the 
 result and in this case, why pay the price formulating and delivering a 
 response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4556) Fix all incorrect uses of InternalScanner.next(...)

2011-10-10 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124349#comment-13124349
 ] 

Jonathan Gray commented on HBASE-4556:
--

Why do we not see bugs because of this?  Should the contract be how we actually 
use it since it seems to work?

 Fix all incorrect uses of InternalScanner.next(...)
 ---

 Key: HBASE-4556
 URL: https://issues.apache.org/jira/browse/HBASE-4556
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl

 There are cases all over the code where InternalScanner.next(...) is not used 
 correctly.
 I see this a lot:
 {code}
 while(scanner.next(...)) {
 }
 {code}
 The correct pattern is:
 {code}
 boolean more = false;
 do {
more = scanner.next(...);
 } while (more);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122034#comment-13122034
 ] 

Jonathan Gray commented on HBASE-4482:
--

+1 on keeping this in 0.92 regardless of stability and marking as experimental.

 Race Condition Concerning Eviction in SlabCache
 ---

 Key: HBASE-4482
 URL: https://issues.apache.org/jira/browse/HBASE-4482
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.92.0

 Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122311#comment-13122311
 ] 

Jonathan Gray commented on HBASE-4536:
--

Lars, I agree that this is an important feature.  Also agree that we should 
take time and do it right and not push for 0.92.

Could we just support some kind of raw scanner along with a TTKAKV config 
(Time To Keep All Key Values)?

 Allow CF to retain deleted rows
 ---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


 Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
 of versions.
 However, if a client deletes a row all version older than the delete tomb 
 stone will be remove at the next major compaction (and even at memstore flush 
 - see HBASE-4241).
 There should be a way to retain those version to guard against software error.
 I see two options here:
 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
 versions even past the delete marker.
 #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
 (from a user viewpoint)
 Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122370#comment-13122370
 ] 

Jonathan Gray commented on HBASE-4547:
--

Post-commit +1.

Stack, should we open another JIRA to deal with your TODO?

 TestAdmin failing in 0.92 because .tableinfo not found
 --

 Key: HBASE-4547
 URL: https://issues.apache.org/jira/browse/HBASE-4547
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4547.txt


 I've been running tests before commit and found the following happens with 
 some regularity, sporadic of course, but they fail fairly frequently:
 {code}
 Failed tests:   
 testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin)
   testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but 
 was:1
   testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): 
 expected:2 but was:1
 {code}
 Looking, it seems like we fail to find .tableinfo in the tests that modify 
 table schema while table is online.
 The update of a table schema just does an overwrite.  In the tests we 
 sometimes fail to find the newly written file or we get EOFE reading it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4549) Add thrift API to read version and build date of HBase

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122373#comment-13122373
 ] 

Jonathan Gray commented on HBASE-4549:
--

+1

 Add thrift API to read version and build date of HBase 
 ---

 Key: HBASE-4549
 URL: https://issues.apache.org/jira/browse/HBASE-4549
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Song Liu
Priority: Minor
   Original Estimate: 2h
  Remaining Estimate: 2h

 Adding API to get the hbase server version and build date will be helpful for 
 the client to communicate with different versions of the server accordingly. 
 class VersionInfo can be reused to provide required information. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-06 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122436#comment-13122436
 ] 

Jonathan Gray commented on HBASE-4528:
--

Dhruba and I just talked about this.  I also like the MemStore rollback.  It 
should not be that difficult, just removing the ListKV that we added.

 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120693#comment-13120693
 ] 

Jonathan Gray commented on HBASE-4422:
--

I have looked at 3446 more.  I'm happy with it and confident it makes things 
better.  Will give the +1.

Re: getting the cache instance from CacheConf, i'm open to other designs, but 
this seems best in that we only need one argument for all the caching stuff vs. 
a separate reference for the cache itself.  What did you have in mind?

Maybe CacheConfig + BlockCache should be somehow combined?  Dunno.

 Move block cache parameters and references into single CacheConf class
 --

 Key: HBASE-4422
 URL: https://issues.apache.org/jira/browse/HBASE-4422
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.92.0

 Attachments: CacheConfig92-v8.patch


 From StoreFile down to HFile, we currently use a boolean argument for each of 
 the various block cache configuration parameters that exist.  The number of 
 parameters is going to continue to increase as we look at compressed cache, 
 delta encoding, and more specific L1/L2 configuration.  Every new config 
 currently requires changing many constructors because it introduces a new 
 boolean.
 We should move everything into a single class so that modifications are much 
 less disruptive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120692#comment-13120692
 ] 

Jonathan Gray commented on HBASE-4422:
--

I have looked at 3446 more.  I'm happy with it and confident it makes things 
better.  Will give the +1.

Re: getting the cache instance from CacheConf, i'm open to other designs, but 
this seems best in that we only need one argument for all the caching stuff vs. 
a separate reference for the cache itself.  What did you have in mind?

Maybe CacheConfig + BlockCache should be somehow combined?  Dunno.

 Move block cache parameters and references into single CacheConf class
 --

 Key: HBASE-4422
 URL: https://issues.apache.org/jira/browse/HBASE-4422
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.92.0

 Attachments: CacheConfig92-v8.patch


 From StoreFile down to HFile, we currently use a boolean argument for each of 
 the various block cache configuration parameters that exist.  The number of 
 parameters is going to continue to increase as we look at compressed cache, 
 delta encoding, and more specific L1/L2 configuration.  Every new config 
 currently requires changing many constructors because it introduces a new 
 boolean.
 We should move everything into a single class so that modifications are much 
 less disruptive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120711#comment-13120711
 ] 

Jonathan Gray commented on HBASE-3446:
--

I've grasped most of the change and this is clearly a significant improvement.  
Let's get it in!

+1 on latest patch up on RB if tests are passing.  TestMergeTool also fails on 
occasion for me.

Nice work stack!

You're thinking CatalogTracker follow-up in 0.94 w/ ROOT removal perhaps?

 ProcessServerShutdown fails if META moves, orphaning lots of regions
 

 Key: HBASE-3446
 URL: https://issues.apache.org/jira/browse/HBASE-3446
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 
 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 
 3446v15.txt


 I ran a rolling restart on a 5 node cluster with lots of regions, and 
 afterwards had LOTS of regions left orphaned. The issue appears to be that 
 ProcessServerShutdown failed because the server hosting META was restarted 
 around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121139#comment-13121139
 ] 

Jonathan Gray commented on HBASE-4536:
--

This changes default behavior now?  I disagree that expected behavior is to 
ever uncover previously deleted data.  I'm okay with this as an option.


 Allow CF to retain deleted rows
 ---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
 of versions.
 However, if a client deletes a row all version older than the delete tomb 
 stone will be remove at the next major compaction (and even at memstore flush 
 - see HBASE-4241).
 There should be a way to retain those version to guard against software error.
 I see two options here:
 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
 versions even past the delete marker.
 #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
 (from a user viewpoint)
 Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121384#comment-13121384
 ] 

Jonathan Gray commented on HBASE-4540:
--

Looks pretty good.  Once you get the unit tests passing, want to put it up on 
RB?

Also, it'd be really good if you could start thinking about how to mock these 
scenarios better in our unit tests.  You are finding lots of great bugs but 
without tests it will be hard to prevent regressions.

 OpenedRegionHandler is not enforcing atomicity of the operation it is 
 performing
 

 Key: HBASE-4540
 URL: https://issues.apache.org/jira/browse/HBASE-4540
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4540_1.patch


 - OpenedRegionHandler has not yet deleted the znode of the region R1 opened 
 by RS1.
 - RS1 goes down.
 - Servershutdownhandler assigns the region R1 to RS2.
 - The znode of R1 is moved to OFFLINE state by master or OPENING state by 
 RS2 if RS2 has started opening the region.
 - Now the first OpenedRegionHandler tries to delete the znode thinking its 
 in OPENED state but fails.
 - Though it fails it removes the node from RIT and adds RS1 as the owner of 
 R1 in master's memory.
 - Now when RS2 completes opening the region the master is not able to open 
 the region as already the reigon has been deleted from RIT.
 {code}
 Master
 ==
 2011-10-05 20:49:45,301 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
 processing of shutdown of linux146,60020,1317827727647
 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because 1 region(s) in transition: 
 {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9.
  state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847}
 2011-10-05 20:49:57,720 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, 
 region=3e69d628a8bd8e9b7c5e7a2a6e03aad9
 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x132d3dc13090023 Deleting existing unassigned node for 
 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED
 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x132d3dc13090023 Attempting to delete unassigned node 
 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in 
 RS_ZK_REGION_OPENING state
 After the region is opened in RS2
 =
 2011-10-05 20:50:48,066 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, 
 region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late
 2011-10-05 20:50:48,290 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 
 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but 
 region was in  the state null and not in expected PENDING_OPEN or OPENING 
 states
 2011-10-05 20:50:53,743 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, 
 region=3e69d628a8bd8e9b7c5e7a2a6e03aad9
 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: 
 Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s)
 2011-10-05 20:50:54,397 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 
 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but 
 region was in  the state null and not in expected PENDING_OPEN or OPENING 
 states
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121461#comment-13121461
 ] 

Jonathan Gray commented on HBASE-4465:
--

Please attach the final patch to JIRA.

 Lazy-seek optimization for StoreFile scanners
 -

 Key: HBASE-4465
 URL: https://issues.apache.org/jira/browse/HBASE-4465
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
  Labels: optimization, seek
 Fix For: 0.89.20100924, 0.94.0


 Previously, if we had several StoreFiles for a column family in a region, we 
 would seek in each of them and only then merge the results, even though the 
 row/column we are looking for might only be in the most recent (and the 
 smallest) file. Now we prioritize our reads from those files so that we check 
 the most recent file first. This is done by doing a lazy seek which 
 pretends that the next value in the StoreFile is (seekRow, seekColumn, 
 lastTimestampInStoreFile), which is earlier in the KV order than anything 
 that might actually occur in the file. So if we don't find the result in 
 earlier files, that fake KV will bubble up to the top of the KV heap and a 
 real seek will be done. This is expected to significantly reduce the amount 
 of disk IO (as of 09/22/2011 we are doing dark launch testing and 
 measurement).
 This is joint work with Liyin Tang -- huge thanks to him for many helpful 
 discussions on this and the idea of putting fake KVs with the highest 
 timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4544) Rename RWCC to MVCC

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121467#comment-13121467
 ] 

Jonathan Gray commented on HBASE-4544:
--

Nice!

Do you want to get this in before/after HBASE-2856?

 Rename RWCC to MVCC
 ---

 Key: HBASE-4544
 URL: https://issues.apache.org/jira/browse/HBASE-4544
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4544-v1.txt


 ReadWriteConcurrencyControl should be called MultiVersionConcurrencyControl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121482#comment-13121482
 ] 

Jonathan Gray commented on HBASE-4465:
--

Committed to trunk.  What's the status on the 89 branch?  Should we keep this 
open?

 Lazy-seek optimization for StoreFile scanners
 -

 Key: HBASE-4465
 URL: https://issues.apache.org/jira/browse/HBASE-4465
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
  Labels: optimization, seek
 Fix For: 0.89.20100924, 0.94.0

 Attachments: 
 HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch


 Previously, if we had several StoreFiles for a column family in a region, we 
 would seek in each of them and only then merge the results, even though the 
 row/column we are looking for might only be in the most recent (and the 
 smallest) file. Now we prioritize our reads from those files so that we check 
 the most recent file first. This is done by doing a lazy seek which 
 pretends that the next value in the StoreFile is (seekRow, seekColumn, 
 lastTimestampInStoreFile), which is earlier in the KV order than anything 
 that might actually occur in the file. So if we don't find the result in 
 earlier files, that fake KV will bubble up to the top of the KV heap and a 
 real seek will be done. This is expected to significantly reduce the amount 
 of disk IO (as of 09/22/2011 we are doing dark launch testing and 
 measurement).
 This is joint work with Liyin Tang -- huge thanks to him for many helpful 
 discussions on this and the idea of putting fake KVs with the highest 
 timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-10-05 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121508#comment-13121508
 ] 

Jonathan Gray commented on HBASE-4465:
--

Nice work Liyin and Mikhail!

 Lazy-seek optimization for StoreFile scanners
 -

 Key: HBASE-4465
 URL: https://issues.apache.org/jira/browse/HBASE-4465
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
  Labels: optimization, seek
 Fix For: 0.89.20100924, 0.94.0

 Attachments: 
 HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch


 Previously, if we had several StoreFiles for a column family in a region, we 
 would seek in each of them and only then merge the results, even though the 
 row/column we are looking for might only be in the most recent (and the 
 smallest) file. Now we prioritize our reads from those files so that we check 
 the most recent file first. This is done by doing a lazy seek which 
 pretends that the next value in the StoreFile is (seekRow, seekColumn, 
 lastTimestampInStoreFile), which is earlier in the KV order than anything 
 that might actually occur in the file. So if we don't find the result in 
 earlier files, that fake KV will bubble up to the top of the KV heap and a 
 real seek will be done. This is expected to significantly reduce the amount 
 of disk IO (as of 09/22/2011 we are doing dark launch testing and 
 measurement).
 This is joint work with Liyin Tang -- huge thanks to him for many helpful 
 discussions on this and the idea of putting fake KVs with the highest 
 timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4534) A new unit test for lazy seek and StoreScanner in general

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120400#comment-13120400
 ] 

Jonathan Gray commented on HBASE-4534:
--

Are we thinking these read optimizations are going to only go into 0.94?  
(Seems reasonable to me, but I will be pulling them into our internal 92 branch)

 A new unit test for lazy seek and StoreScanner in general
 -

 Key: HBASE-4534
 URL: https://issues.apache.org/jira/browse/HBASE-4534
 Project: HBase
  Issue Type: Test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 A randomized unit test for Gets/Scans (all-row, single-row, multi-row, 
 all-column, single-column, and multi-column). Also all combinations of Bloom 
 filters and compression (NONE vs GZIP) are tested. The unit test flushes 
 multiple StoreFiles with disjoint timestamp ranges and runs various types of 
 queries against them. Currently we are not testing overlapping timestamp 
 ranges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-10-04 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120637#comment-13120637
 ] 

Jonathan Gray commented on HBASE-4422:
--

Yeah, if this goes to trunk but not 92 then begins the fun of rebasing patches 
for each because it changes so many constructors in/around HFile.

 Move block cache parameters and references into single CacheConf class
 --

 Key: HBASE-4422
 URL: https://issues.apache.org/jira/browse/HBASE-4422
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.92.0

 Attachments: CacheConfig92-v8.patch


 From StoreFile down to HFile, we currently use a boolean argument for each of 
 the various block cache configuration parameters that exist.  The number of 
 parameters is going to continue to increase as we look at compressed cache, 
 delta encoding, and more specific L1/L2 configuration.  Every new config 
 currently requires changing many constructors because it introduces a new 
 boolean.
 We should move everything into a single class so that modifications are much 
 less disruptive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4527) Fix versioning such that every update is unique

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119747#comment-13119747
 ] 

Jonathan Gray commented on HBASE-4527:
--

Yeah, seems like we could utilize the memstoreTS once we solve those issues and 
then we don't care about the timestamp being the same.  But we'd then have to 
expose RWCC to the API because checkAndPut would need to specify it?  ugh

 Fix versioning such that every update is unique
 ---

 Key: HBASE-4527
 URL: https://issues.apache.org/jira/browse/HBASE-4527
 Project: HBase
  Issue Type: Wish
Reporter: stack

 I wanted to use checkAndPut but there is a case where the check will not fail 
 though the cell has been updated: if a cell is update with exactly the value 
 it had before, we'll not know its been changed.  hbase-4507 did a checkAndPut 
 where you could pass a timestamp as part of the check so we'd check the cell 
 value AND that the timestamp was the same.
 This would work in most regards but one; an update is done in the same 
 millisecond.  This is generally impossible but in a distributed system where 
 clocks drift and a region can be moved to a server whose clock is retarded, 
 it is within the realm of possibilities that it could happen.  So we should 
 deal.
 One thought is that the version is made for sure unique.  We could make the 
 timestamp wider still so probability of the edits arriving within the same 
 microsecond -- or whatever it is that a double gives you -- would require us 
 to run through a couple of billion universe expand/contract cycles or we 
 could have a monotonically increasing sequence id per millisecond.
 There could be some overlap between this issue and the persisting of rwcc to 
 the filesystem (though not currently as rwcc is implemented).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4533) ops_mgt.xml - tweaks to backup section

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119750#comment-13119750
 ] 

Jonathan Gray commented on HBASE-4533:
--

Doug, do you think you could add [book] to the front of your commits or 
something?  I'm doing a lot of repository management and that'd be super 
helpful :)

 ops_mgt.xml - tweaks to backup section
 --

 Key: HBASE-4533
 URL: https://issues.apache.org/jira/browse/HBASE-4533
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: ops_mgt_HBASE_4533.xml.patch, 
 ops_mgt_HBASE_4533_v2.xml.patch


 Minor tweaks to backup section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4527) Fix versioning such that every update is unique

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119756#comment-13119756
 ] 

Jonathan Gray commented on HBASE-4527:
--

Agreed.

 Fix versioning such that every update is unique
 ---

 Key: HBASE-4527
 URL: https://issues.apache.org/jira/browse/HBASE-4527
 Project: HBase
  Issue Type: Wish
Reporter: stack

 I wanted to use checkAndPut but there is a case where the check will not fail 
 though the cell has been updated: if a cell is update with exactly the value 
 it had before, we'll not know its been changed.  hbase-4507 did a checkAndPut 
 where you could pass a timestamp as part of the check so we'd check the cell 
 value AND that the timestamp was the same.
 This would work in most regards but one; an update is done in the same 
 millisecond.  This is generally impossible but in a distributed system where 
 clocks drift and a region can be moved to a server whose clock is retarded, 
 it is within the realm of possibilities that it could happen.  So we should 
 deal.
 One thought is that the version is made for sure unique.  We could make the 
 timestamp wider still so probability of the edits arriving within the same 
 microsecond -- or whatever it is that a double gives you -- would require us 
 to run through a couple of billion universe expand/contract cycles or we 
 could have a monotonically increasing sequence id per millisecond.
 There could be some overlap between this issue and the persisting of rwcc to 
 the filesystem (though not currently as rwcc is implemented).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119748#comment-13119748
 ] 

Jonathan Gray commented on HBASE-4532:
--

Whoo!  +1

 Avoid top row seek by dedicated bloom filter for delete family
 --

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119754#comment-13119754
 ] 

Jonathan Gray commented on HBASE-4536:
--

We were also discussion today that there are situations (especially in 
multi-master situations) where you want to retain the delete markers for some 
period of time as well.

I would think this would require both a family-level setting (a new one or the 
existing one) and also a read-time option, correct?  As of now, deletes are 
never returned to the client.  You'd have to return them in this case otherwise 
the user would have no idea what is actually there?  I'm not sure it's fair to 
ask a user to understand how our delete tombstones work :)

 Allow CF to retain deleted rows
 ---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
 of versions.
 However, if a client deletes a row all version older than the delete tomb 
 stone will be remove at the next major compaction (and even at memstore flush 
 - see HBASE-4241).
 There should be a way to retain those version to guard against software error.
 I see two options here:
 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
 versions even past the delete marker.
 #1 would allow for more flexibility. #2 comes somewhat naturally with parent 
 (from a user viewpoint)
 Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4534) A new unit test for lazy seek and StoreScanner in general

2011-10-03 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119869#comment-13119869
 ] 

Jonathan Gray commented on HBASE-4534:
--

LUCENE-3408 is just a wrapper around two counter implementations.  One is 
thread-safe and uses an AtomicLong, the other is not and uses a long.  It looks 
like they were just trying to improve performance when the counter was being 
used in a single thread.

+1 that we should deal with changing AtomicLong to something else in another 
jira.

 A new unit test for lazy seek and StoreScanner in general
 -

 Key: HBASE-4534
 URL: https://issues.apache.org/jira/browse/HBASE-4534
 Project: HBase
  Issue Type: Test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 A randomized unit test for Gets/Scans (all-row, single-row, multi-row, 
 all-column, single-column, and multi-column). Also all combinations of Bloom 
 filters and compression (NONE vs GZIP) are tested. The unit test flushes 
 multiple StoreFiles with disjoint timestamp ranges and runs various types of 
 queries against them. Currently we are not testing overlapping timestamp 
 ranges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4521) Get the hadoop patch-submission build working for hbase

2011-09-30 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118245#comment-13118245
 ] 

Jonathan Gray commented on HBASE-4521:
--

Big +1

 Get the hadoop patch-submission build working for hbase
 ---

 Key: HBASE-4521
 URL: https://issues.apache.org/jira/browse/HBASE-4521
 Project: HBase
  Issue Type: Task
Reporter: stack

 We need the facility over in hadoop where on 'patch submission', jenkins 
 tries the patch against current state of trunk.  We need this facility 
 because its a productivity killer expecting each dev vet the patch -- let 
 jenkins do it for us.  I'm trying to get Giri, the hadoop build fellow, to 
 help us set this up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4522) Make hbase-site-custom.xml override the hbase-site.xml

2011-09-30 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118275#comment-13118275
 ] 

Jonathan Gray commented on HBASE-4522:
--

Can't hbase-site import hbase-site-custom?

 Make hbase-site-custom.xml override the hbase-site.xml
 --

 Key: HBASE-4522
 URL: https://issues.apache.org/jira/browse/HBASE-4522
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Liyin Tang
Priority: Minor
 Fix For: 0.94.0


 The motivation for diff is that we want to override some config change for 
 any specific cluster easily by just adding the config entries in the 
 hbase-site-custom.xml for that cluster. This change adds the 
 hbase-site-custom.xml configuration file into HBaseConfiguration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog

2011-09-30 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118311#comment-13118311
 ] 

Jonathan Gray commented on HBASE-4487:
--

+1 as well.  And agree with your assessment above, Stack.  Potential fatter 
grouping of increments and significant improvement of per-row throughput.

Looking forward to getting this working for Put/MultiPut!  Nice work, Dhruba.

 The increment operation can release the rowlock before sync-ing the Hlog
 

 Key: HBASE-4487
 URL: https://issues.apache.org/jira/browse/HBASE-4487
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: 4487-v7.txt, appendNoSync4.txt, appendNoSync5.txt, 
 appendNoSync6.txt


 This allows for better throughput when there are hot rows.I have seen this 
 change make a single row update improve from 400 increments/sec/server to 
 4000 increments/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117449#comment-13117449
 ] 

Jonathan Gray commented on HBASE-4477:
--

+1 on CPPutInfo, CPGetInfo, etc...

 Ability for an application to store metadata into the transaction log
 -

 Key: HBASE-4477
 URL: https://issues.apache.org/jira/browse/HBASE-4477
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: coprocessorPut1.txt, hlogMetadata1.txt


 mySQL allows an application to store an arbitrary blob along with each 
 transaction in its transaction logs. This JIRA is to have a similar feature 
 request for HBASE.
 The use case is as follows: An application on one data center A stores a blob 
 of data along with each transaction. A replication software picks up these 
 blobs from the transaction logs in A and hands it to another instance of the 
 same application running on a remote data center B. The application in B is 
 responsible for applying this to the remote Hbase cluster (and also handle 
 conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117450#comment-13117450
 ] 

Jonathan Gray commented on HBASE-4477:
--

And yeah, maybe introduce CPPutInfo in this JIRA and open a follow-up to change 
the others

 Ability for an application to store metadata into the transaction log
 -

 Key: HBASE-4477
 URL: https://issues.apache.org/jira/browse/HBASE-4477
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: coprocessorPut1.txt, hlogMetadata1.txt


 mySQL allows an application to store an arbitrary blob along with each 
 transaction in its transaction logs. This JIRA is to have a similar feature 
 request for HBASE.
 The use case is as follows: An application on one data center A stores a blob 
 of data along with each transaction. A replication software picks up these 
 blobs from the transaction logs in A and hands it to another instance of the 
 same application running on a remote data center B. The application in B is 
 responsible for applying this to the remote Hbase cluster (and also handle 
 conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117496#comment-13117496
 ] 

Jonathan Gray commented on HBASE-4477:
--

PutInfo seems overly generic but I agree that CPPutInfo is straight ugly.  And 
I keep thinking it says CPUInfo.

So Dhruba should just extend the API for now and we can introduce these new 
classes in a follow-up jira.

 Ability for an application to store metadata into the transaction log
 -

 Key: HBASE-4477
 URL: https://issues.apache.org/jira/browse/HBASE-4477
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: coprocessorPut1.txt, hlogMetadata1.txt


 mySQL allows an application to store an arbitrary blob along with each 
 transaction in its transaction logs. This JIRA is to have a similar feature 
 request for HBASE.
 The use case is as follows: An application on one data center A stores a blob 
 of data along with each transaction. A replication software picks up these 
 blobs from the transaction logs in A and hands it to another instance of the 
 same application running on a remote data center B. The application in B is 
 responsible for applying this to the remote Hbase cluster (and also handle 
 conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117497#comment-13117497
 ] 

Jonathan Gray commented on HBASE-4477:
--

And it looks like the patch from this morning does exactly that.

I'm +1 on coprocessorPut1.txt.  Someone else want to review?

 Ability for an application to store metadata into the transaction log
 -

 Key: HBASE-4477
 URL: https://issues.apache.org/jira/browse/HBASE-4477
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: coprocessorPut1.txt, coprocessorPut2.txt, 
 hlogMetadata1.txt


 mySQL allows an application to store an arbitrary blob along with each 
 transaction in its transaction logs. This JIRA is to have a similar feature 
 request for HBASE.
 The use case is as follows: An application on one data center A stores a blob 
 of data along with each transaction. A replication software picks up these 
 blobs from the transaction logs in A and hands it to another instance of the 
 same application running on a remote data center B. The application in B is 
 responsible for applying this to the remote Hbase cluster (and also handle 
 conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-29 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117804#comment-13117804
 ] 

Jonathan Gray commented on HBASE-4496:
--

So this exact issue actually triggered why I was having a hard time getting 
TestCacheOnWrite to pass.  The test was previously relying on some 
broken/inconsistent behavior in which it passes a single instance of a reader 
with a null block cache but that was removed with the latest CacheConfig stuff.

My latest patch for HBASE-4422 actually just changes the always true to always 
false :)  I'm going to talk to Mikhail tomorrow (Friday) about the issue here 
and see if he has any thoughts.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4496.txt


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116525#comment-13116525
 ] 

Jonathan Gray commented on HBASE-4497:
--

I don't think we can use the same ID as the ZK node.  But we could just some 
incrementing number.

An alternative would be to instead allow the roll-back of the META edit using a 
checkAndDelete which might be simpler but less optimal.

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116552#comment-13116552
 ] 

Jonathan Gray commented on HBASE-4488:
--

Can you explain what you mean Lars?  Something is wrong with HBASE-4433 or 
there's nothing to worry about once I commit this :)

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116585#comment-13116585
 ] 

Jonathan Gray commented on HBASE-4492:
--

+1 for commit

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >