from:"stack \(Commented\) \(JIRA\)"

[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211060#comment-13211060
 ] 

stack commented on HBASE-5423:
--

@Chunhui Sounds good.  Want me to change the name on commit or do you want to 
put up a new patch?  I'd add a log that we were exiting though online regions 
too   Good stuff.

 Regionserver may block forever on waitOnAllRegionsToClose when aborting
 ---

 Key: HBASE-5423
 URL: https://issues.apache.org/jira/browse/HBASE-5423
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5423.patch


 If closeRegion throws any exception (It would be caused by FS ) when RS is 
 aborting, 
 RS will block forever on waitOnAllRegionsToClose().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211061#comment-13211061
 ] 

stack commented on HBASE-5431:
--

So, we output Deletes and then we output Deletes?  We'll be changing the order 
of kvs that came in in the Result?  Thats ok?

 Improve delete marker handling in Import M/R jobs
 -

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5431.txt


 Import currently create a new Delete object for each delete KV found in a 
 result object.
 This can be improved with the new Delete API that allows adding a delete KV 
 to a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-18 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211067#comment-13211067
]

stack commented on HBASE-5209:
--

My fault. I was reading HBASE-5209-v1.diff. Pardon me. Thanks for testing.
Old client against new cluster is what needs to work (new client against old
server is YMMD).

Patch looks good to me. Mind attaching it here so we can run it through
hadoopqa to make sure it doesn't have side effects? Then I'll commit. Thanks
David.

HConnection/HMasterInterface should allow for way to get hostname of
currently active master in multi-master HBase setup

Key: HBASE-5209
URL: https://issues.apache.org/jira/browse/HBASE-5209
Project: HBase
Issue Type: Improvement
Components: master
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Aditya Acharya
Assignee: David S. Wang
Fix For: 0.94.0, 0.90.7, 0.92.1

Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff

I have a multi-master HBase set up, and I'm trying to programmatically
determine which of the masters is currently active. But the API does not
allow me to do this. There is a getMaster() method in the HConnection class,
but it returns an HMasterInterface, whose methods do not allow me to find out
which master won the last race. The API should have a
getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5255) Use singletons for OperationStatus to save memory

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211074#comment-13211074
 ] 

stack commented on HBASE-5255:
--

Thanks Benoit.  Fixed it w/ HBASE-5432.

 Use singletons for OperationStatus to save memory
 -

 Key: HBASE-5255
 URL: https://issues.apache.org/jira/browse/HBASE-5255
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.90.5, 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor
  Labels: performance
 Fix For: 0.94.0, 0.92.1

 Attachments: 5255-92.txt, 5255-v2.txt, 
 HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch, 
 HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch


 Every single {{Put}} causes the allocation of at least one 
 {{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so 
 these allocations are unnecessary and could be avoided.  Attached patch adds 
 a few singletons and uses them, with no public API change.  I didn't test the 
 patches, but you get the idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1323#comment-1323
 ] 

stack commented on HBASE-5422:
--

@Chunhui Want to make a new patch that does this -- 'I agree with make an 
addPlan method that takes a Map of plans.'?

 StartupBulkAssigner would cause a lot of timeout on RIT when assigning large 
 numbers of regions (timeout = 3 mins)
 --

 Key: HBASE-5422
 URL: https://issues.apache.org/jira/browse/HBASE-5422
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
 Attachments: 5422-90.patch, hbase-5422.patch


 In our produce environment
 We find a lot of timeout on RIT when cluster up, there are about 7w regions 
 in the cluster( 25 regionservers ).
 First, we could see the following log:(See the region 
 33cf229845b1009aa8a3f7b0f85c9bd0)
 master's log
 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Async create of unassigned node for 
 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 
 2012-02-13 18:07:42,560 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
  rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=OFFLINE, ts=1329127661409, 
 server=r03f11025.yh.aliyun.com,60020,1329127549907 
 2012-02-13 18:07:42,996 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
  rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=OFFLINE, ts=1329127661409 
 2012-02-13 18:10:48,072 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=PENDING_OPEN, ts=1329127662996
 2012-02-13 18:10:48,072 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 2012-02-13 18:11:16,744 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=r03f11025.yh.aliyun.com,60020,1329127549907, 
 region=33cf229845b1009aa8a3f7b0f85c9bd0 
 2012-02-13 18:38:07,310 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 
 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Deleting existing unassigned node for 
 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state 
 RS_ZK_REGION_OPENED 
 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 
 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 
 2012-02-13 18:38:07,573 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on 
 r03f11025.yh.aliyun.com,60020,1329127549907 
 2012-02-13 18:50:54,428 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so 
 generated a random one; 
 hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., 
 src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, 
 exclude=null) available servers 
 2012-02-13 18:50:54,428 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to 
 r01b05043.yh.aliyun.com,60020,1329127549041 
 2012-02-13 19:31:50,514 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=PENDING_OPEN, ts=1329132528086 
 2012-02-13 19:31:50,514 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 Regionserver's log
 2012-02-13 18:07:43,537 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
 region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 2012-02-13 18:11:16,560 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
 open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 Through the RS's log, we could find it is larger than 3mins from receive 
 openRegion request to start processing openRegion, causing

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1328#comment-1328
 ] 

stack commented on HBASE-5317:
--

Gregory it failed with this:

{code}
[INFO] --- maven-compiler-plugin:2.0.2:testCompile (default-testCompile) @ 
hbase ---
[INFO] Compiling 331 source files to 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-classes
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 23.287s
[INFO] Finished at: Sat Feb 18 01:46:56 UTC 2012
[INFO] Final Memory: 41M/424M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile 
(default-testCompile) on project hbase: Compilation failure
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java:[79,33]
 cannot find symbol
[ERROR] symbol  : method getDefaultRootDirPath()
[ERROR] location: class org.apache.hadoop.hbase.HBaseTestingUtility
[ERROR] - [Help 1]
{code}


What you reckon thats about?  Is it the 0.23 profile? leaking?

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, 
 HBASE-5317-v3.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211202#comment-13211202
 ] 

stack commented on HBASE-5431:
--

bq. In fact the only correct ordering would be to create a Put or Delete for 
each KV.

Yeah.  I was wondering about this.

OK. +1.

For Amit's patch, if we switched on his facility, then we'd export with 
memstorets?  Though I suppose that'd be no good at import time?

 Improve delete marker handling in Import M/R jobs
 -

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5431.txt


 Import currently create a new Delete object for each delete KV found in a 
 result object.
 This can be improved with the new Delete API that allows adding a delete KV 
 to a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-18 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211204#comment-13211204
 ] 

stack commented on HBASE-3149:
--

Thanks @Nicolas (and thanks @Mubarak -- sounds like something to indeed get 
into 0.92).

At the same time, I'd think this issue still worth some time; if lots of cfs 
and only one is filling, its silly to flush the others as we do now because one 
is over the threshold.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-20 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212107#comment-13212107
 ] 

stack commented on HBASE-5396:
--

@Jieshan Thats interesting.  Thanks for checking it out.  Why do we not have 
the prob. in 0.92/trunk?  Is code different?

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-20 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212109#comment-13212109
]

stack commented on HBASE-5416:
--

@Max What you think about the failed TestFilter in the above? Is it your patch?
Thanks.

Improve performance of scans with some kind of filters.
---

Key: HBASE-5416
URL: https://issues.apache.org/jira/browse/HBASE-5416
Project: HBase
Issue Type: Improvement
Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Attachments: Filtered_scans.patch

When the scan is performed, whole row is loaded into result list, after that
filter (if exists) is applied to detect that row is needed.
But when scan is performed on several CFs and filter checks only data from
the subset of these CFs, data from CFs, not checked by a filter is not needed
on a filter stage. Only when we decided to include current row. And in such
case we can significantly reduce amount of IO performed by a scan, by loading
only values, actually checked by a filter.
For example, we have two CFs: flags and snap. Flags is quite small (bunch of
megabytes) and is used to filter large entries from snap. Snap is very large
(10s of GB) and it is quite costly to scan it. If we needed only rows with
some flag specified, we use SingleColumnValueFilter to limit result to only
small subset of region. But current implementation is loading both CFs to
perform scan, when only small subset is needed.
Attached patch adds one routine to Filter interface to allow filter to
specify which CF is needed to it's operation. In HRegion, we separate all
scanners into two groups: needed for filter and the rest (joined). When new
row is considered, only needed data is loaded, filter applied, and only if
filter accepts the row, rest of data is loaded. At our data, this speeds up
such kind of scans 30-50 times. Also, this gives us the way to better
normalize the data into separate columns by optimizing the scans performed.

[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-20 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212127#comment-13212127
 ] 

stack commented on HBASE-5424:
--

@Zhiyuan Your patch failed to apply to trunk.  See the console output:

{code}
patching file src/main/java/org/apache/hadoop/hbase/client/HTable.java
Hunk #1 FAILED at 423.
patch unexpectedly ends in middle of line
Hunk #2 succeeded at 428 with fuzz 2 (offset -14 lines).
1 out of 2 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/client/HTable.java.rej
PATCH APPLICATION FAILED
{code}

Mind fixing?  Your patch seems to have some odd formatting too.  Thanks.

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch, HBase-5424_1.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-20 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212142#comment-13212142
 ] 

stack commented on HBASE-4365:
--

bq. Wouldn't we potentially do a lot of splitting when there are many 
regionservers?

Each regionserver would split with the same growing reluctance.  Don't we want 
a bunch of splitting when lots of regionservers so they all get some amount of 
the incoming load promptly?

This issue is about getting us to split fast at the start of a bulk load but 
then having the splitting fall off as more data made it in.

I'm thinking our default regionsize should be 10G.  I should add this to the 
this patch.

I don't get what you are saying on the end Lars.  Is it good or bad that there 
are 5 regions on a regionserver before we get to the max size?  Balancer will 
cut in and move regions to other servers and they'll then split eagerly at 
first with rising reluctance.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0, 0.92.1
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-20 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212166#comment-13212166
 ] 

stack commented on HBASE-3149:
--

@Nicolas I wonder about this... hbase.hstore.compaction.min.size.  When we 
compact, don't we have to take adjacent files as part of our ACID guarantees?  
This would frustrate that?  (I'll take a look... tomorrow).  I'm wondering 
because i want to figure how to make it so we favor reference files... so they 
are always included in a compaction.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-20 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212370#comment-13212370
 ] 

stack commented on HBASE-5396:
--

If you found it in 0.92, thats good enough -- its in TRUNK I'd say.

Do you have more of the regionserver log?  Why does it say its aborting?  You 
don't have it in your log above (The logs above look 'normal'.. we need to bits 
that show it gone awry... thanks Jieshan).

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-20 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212375#comment-13212375
]

stack commented on HBASE-5416:
--

bq. I have a question about this. Manual == hbase book? And what 'filters
package doc' is? Is it comments in source processed by javadoc, or somethinc
else? Sorry for these questions - have no java experience .

No problem.

Yes, the 'reference guide' or manual is this http://hbase.apache.org/book.html
Its a bit tough making a patch for it if you don't know doc book too well so
could just put a paragraph here and i'll get the doc in for you. Or, the
filters package doc I was referring to is here:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description...
but the doc here is pretty pathetic and describing this facility there might
not go so well (its of a subtlety the current doc does not allow).

Just stick a bit of a paragraph here and I'll figure where to put it.

Go easy Max.

You saw the failed test above? The fail in TestFilter? Do you see that when
you run your tests local? On trunk you do it so:

{code}
% mvn test -P localTests -Dtest=TestFilter
{code}

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213335#comment-13213335
 ] 

stack commented on HBASE-5396:
--

The log does not show the regionserver aborting.  Should it?  Or am I 
misunderstanding (I guess I'm not clear on what I should be looking for in this 
log.  Please help me Jieshan.  Sorry for being a bit slow).

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch, Logs-TestFor92.rar


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5439) Fix some performance findbugs issues

2012-02-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213339#comment-13213339
 ] 

stack commented on HBASE-5439:
--

+1

 Fix some performance findbugs issues
 

 Key: HBASE-5439
 URL: https://issues.apache.org/jira/browse/HBASE-5439
 Project: HBase
  Issue Type: Improvement
  Components: performance
Affects Versions: 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Attachments: HBASE-5439.patch


 Given 0.94 is the performance release, I took a look at some performance 
 findbugs.
 This patch should fixeall of the following types of findbugs (except one case 
 in generated code):
 Bug type DM_NUMBER_CTOR
 Bug type DM_STRING_CTOR
 Bug type DM_BOOLEAN_CTOR
 (these are simple constructor issues where Type.valueOf is more efficient
 Fixes one of:
 Bug type SIC_INNER_SHOULD_BE_STATIC (Inner class should be static)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213374#comment-13213374
 ] 

stack commented on HBASE-5396:
--

So what am I looking to see in the log snippet above (and in the attached log?) 
 Thanks Jieshan.

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch, Logs-TestFor92.rar


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213389#comment-13213389
 ] 

stack commented on HBASE-4365:
--

If I understand correctly a regionserver would still split at a size  10gb 
until there about 900 regions for the table (assuming somewhat even 
distribution).

Well each split would take longer because the threshold will have grown closer 
to the 10GB, but yeah.  And I think this is what we want.  Doing to the power 
of 3 would make us rise to the 10GB faster.  We'd split on first flush then at 

This is probably ok.  More regions means that we'll fan out regions over the 
cluster a little faster.  We'll have 9 regions for a table on each server which 
is probably too many still.  We could do to the power of 3 so we'd split on 
first flush, then at 1G, 3.4G, 8.2G and then we'd be at our 10G limit.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0, 0.92.1
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5433) [REST] Add metrics to keep track of success/failure count

2012-02-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213391#comment-13213391
 ] 

stack commented on HBASE-5433:
--

+1 on patch.

Mubarak, can you see the metrics coming out in your metrics system?  They show 
ok?

Will commit tomorrow unless objection (Andy?  Want to check it out?)

 [REST] Add metrics to keep track of success/failure count
 -

 Key: HBASE-5433
 URL: https://issues.apache.org/jira/browse/HBASE-5433
 Project: HBase
  Issue Type: Improvement
  Components: metrics, rest
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Mubarak Seyed
  Labels: noob
 Fix For: 0.94.0

 Attachments: HBASE-5433.trunk.v1.patch


 In a production environment, the visibility of successful REST request(s) are 
 not getting exposed to metric system as we have only one metric (requests) 
 today.
 Proposing to add more metrics such as successful_get_count, failed_get_count, 
 successful_put_count, failed_put_count
 The current implementation increases the request count at the beginning of 
 the method implementation and it is very hard to monitor requests (unless 
 turn on debug, find the row_key and validate it in get/scan using hbase 
 shell), it will be very useful to ops to keep an eye as requests from cross 
 data-centers are trying to write data to one cluster using REST gateway 
 through load balancer (and there is no visibility of which REST-server/RS 
 failed to write data)
 {code}
  Response update(final CellSetModel model, final boolean replace) {
 // for requests
 servlet.getMetrics().incrementRequests(1);
..  
..
   table.put(puts);
   table.flushCommits();
   ResponseBuilder response = Response.ok();
   // for successful_get_count
   servlet.getMetrics().incrementSuccessfulGetRequests(1);
   return response.build();
 } catch (IOException e) {
   // for failed_get_count
   servlet.getMetrics().incrementFailedGetRequests(1);
   throw new WebApplicationException(e,
   Response.Status.SERVICE_UNAVAILABLE);
 } finally {
 }
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213399#comment-13213399
 ] 

stack commented on HBASE-4991:
--

@Mubarak

Do we need to add this method to the region server interface?

{code}
+  public int getRegionsCount(byte[] regionName) throws IOException;
{code}

Can we not just count what comes back from the get on online regions?

Do we have to run the region delete in the Master process?  Can the client not 
do it?

Is it really necessary adding +  public MasterDeleteRegionTracker 
getDeleteRegionTracker(); to the MasterServices?  This will have a ripple 
effect through Tests and it seems like a bit of an exotic API to have in this 
basic Interface.

I like the refactor in HRegion.

Does all of this new code need to be in HRegionServer?  Can it live in a class 
of its own?

There must be a million holes here (HRS crashes in middle of file moving or 
creation of the merged region, files partially moved or deleted).

Does this code all need to be in core?  Can we not make a few primitives and 
then run it all from outside in a tool or script w/ state recorded as we go so 
can resume if fail mid-way?  There are a bunch of moving pieces here.  Its all 
bundled up in core code so its going to be tough to test.

Adding this to onlineregions, +  public void deleteRegion(String regionName) 
throws IOException, KeeperException;, do all removals from online regions now 
use this new API (Its probably good having it here... but just wondering about 
the places where regions currently get removed from online map, do they go a 
different route than this new one?)

H... looks like a bunch of state is being tracked in zk.  Thats good.  Its 
all custom to this feature.  How hard will it be to reuse parts to do say an 
online merge of a bunch of adjacent regions?

Yeah, there is a lot of moving parts... a master delete tracker and a 
regionserver delete tracker...  I've not done an extensive review of design but 
that seems pretty heavy going.

Are the enums duplicated?

Why does zookeeper package have classes particular to master and regionserver?



 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214114#comment-13214114
 ] 

stack commented on HBASE-5434:
--

@Mubarak What is your wiki id and I'll add you as a contributor

 [REST] Include more metrics in cluster status request
 -

 Key: HBASE-5434
 URL: https://issues.apache.org/jira/browse/HBASE-5434
 Project: HBase
  Issue Type: Improvement
  Components: metrics, rest
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Mubarak Seyed
Priority: Minor
  Labels: noob
 Fix For: 0.94.0

 Attachments: HBASE-5434.trunk.v1.patch


 /status/cluster shows only
 {code}
 stores=2
 storefiless=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 {code}
 for a region but master web-ui shows
 {code}
 stores=1,
 storefiles=0,
 storefileUncompressedSizeMB=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 readRequestsCount=0
 writeRequestsCount=0
 rootIndexSizeKB=0
 totalStaticIndexSizeKB=0
 totalStaticBloomSizeKB=0
 totalCompactingKVs=0
 currentCompactedKVs=0
 compactionProgressPct=NaN
 {code}
 In a write-heavy REST gateway based production environment, ops team needs to 
 verify whether write counters are getting incremented per region (they do run 
 /status/cluster on each REST server), we can get the same values from 
 *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but 
 some home-grown tools needs to parse the output of /status/cluster and 
 updates the dashboard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214119#comment-13214119
 ] 

stack commented on HBASE-5456:
--

I don't think that this is the kind of thing you can do by fiat.  My take is 
that we'll use PowerMock when it makes sense (apart from the fact that 
PowerMock isn't exactly a walk-in-the-park).

My current take on testing in hbase is that so much of our code base is test 
inscrutable and that anything we can do to shine light on these untested 
savannas of code is good by me, even unto adding public methods that allow 
injection of alternate classes.

 Introduce PowerMock into our unit tests to reduce unnecessary method exposure
 -

 Key: HBASE-5456
 URL: https://issues.apache.org/jira/browse/HBASE-5456
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 We should introduce PowerMock into our unit tests so that we don't have to 
 expose methods intended to be used by unit tests.
 Here was Benoit's reply to a user of asynchbase about testability:
 OpenTSDB has unit tests that are mocking out HBaseClient just fine
 [1].  You can mock out pretty much anything on the JVM: final,
 private, JDK stuff, etc.  All you need is the right tools.  I've been
 very happy with PowerMock.  It supports Mockito and EasyMock.
 I've never been keen on mutilating public interfaces for the sake of
 testing.  With tools like PowerMock, we can keep the public APIs tidy
 while mocking and overriding anything, even in the most private guts
 of the classes.
  [1] 
 https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5457) add inline index in data block for data which are not clustered together

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214126#comment-13214126
 ] 

stack commented on HBASE-5457:
--

bq. So if we can add inline block index on required columns, the second column 
family then is not needed.

What would this look like He?

 add inline index in data block for data which are not clustered together
 

 Key: HBASE-5457
 URL: https://issues.apache.org/jira/browse/HBASE-5457
 Project: HBase
  Issue Type: New Feature
Reporter: He Yongqiang

 As we are go through our data schema, and we found we have one large column 
 family which is just duplicating data from another column family and is just 
 a re-org of the data to cluster data in a different way than the original 
 column family in order to serve another type of queries efficiently.
 If we compare this second column family with similar situation in mysql, it 
 is like an index in mysql. So if we can add inline block index on required 
 columns, the second column family then is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214132#comment-13214132
 ] 

stack commented on HBASE-3484:
--

bq. It probably has negative memory effects in its current incarnation.

How you think Todd?  Because of the tiering cost more or is it something to do 
w/ mslab allocations?

What would you like to see test-wise proving this direction better than what we 
currently have? I could work up some tests?

 Replace memstore's ConcurrentSkipListMap with our own implementation
 

 Key: HBASE-3484
 URL: https://issues.apache.org/jira/browse/HBASE-3484
 Project: HBase
  Issue Type: Improvement
  Components: performance
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Attachments: hierarchical-map.txt


 By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements 
 to it for our use case in MemStore:
 - add an iterator.replace() method which should allow us to do upsert much 
 more cheaply
 - implement a Set directly without having to do MapKeyValue,KeyValue to 
 save one reference per entry
 It turns out CSLM is in public domain from its development as part of JSR 
 166, so we should be OK with licenses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5419) FileAlreadyExistsException has moved from mapred to fs package

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214138#comment-13214138
 ] 

stack commented on HBASE-5419:
--

Ok if this is for 0.94 only?  I think its fine to drop 'support' for hadoop 
0.20/branch-0.20-append in 0.94.

 FileAlreadyExistsException has moved from mapred to fs package
 --

 Key: HBASE-5419
 URL: https://issues.apache.org/jira/browse/HBASE-5419
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1767.1.patch, D1767.1.patch


 The FileAlreadyExistsException has moved from 
 org.apache.hadoop.mapred.FileAlreadyExistsException to 
 org.apache.hadoop.fs.FileAlreadyExistsException. HBase is currently using a 
 class that is deprecated in hadoop trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214141#comment-13214141
 ] 

stack commented on HBASE-4403:
--

Retry Jimmy making sure the patch you want to run against hadoopqa is applied 
last?

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.94.0

 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, 
 hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, 
 hbase-4403.patch


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5454) Refuse operations from Admin befor master is initialized

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214144#comment-13214144
 ] 

stack commented on HBASE-5454:
--

This patch is a good idea Chunhui.

I don't think you need to have this message Master is not initialized on an 
exception whose type is: MasterNotInitializedException.  It seems redundant.

Remove this line:

+ * Copyright 2007 The Apache Software Foundation

Maybe you don't need these two methods?

+  public MasterNotInitializedException(String s) {
+super(s);
+  }
+
+  /**
+   * Constructor taking another exception.
+   * 
+   * @param e Exception to grab data from.
+   */
+  public MasterNotInitializedException(Exception e) {



 Refuse operations from Admin befor master is initialized
 

 Key: HBASE-5454
 URL: https://issues.apache.org/jira/browse/HBASE-5454
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
 Attachments: hbase-5454.patch


 In our testing environment,
 When master is initializing, we found conflict problems between 
 master#assignAllUserRegions and EnableTable event, causing assigning region 
 throw exception so that master abort itself.
 We think we'd better refuse operations from Admin, such as CreateTable, 
 EnableTable,etc, It could reduce error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214145#comment-13214145
 ] 

stack commented on HBASE-5434:
--

I added you Mubarak.  Try editing wiki (I'm in process of applying this patch).

 [REST] Include more metrics in cluster status request
 -

 Key: HBASE-5434
 URL: https://issues.apache.org/jira/browse/HBASE-5434
 Project: HBase
  Issue Type: Improvement
  Components: metrics, rest
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Mubarak Seyed
Priority: Minor
  Labels: noob
 Fix For: 0.94.0

 Attachments: HBASE-5434.trunk.v1.patch


 /status/cluster shows only
 {code}
 stores=2
 storefiless=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 {code}
 for a region but master web-ui shows
 {code}
 stores=1,
 storefiles=0,
 storefileUncompressedSizeMB=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 readRequestsCount=0
 writeRequestsCount=0
 rootIndexSizeKB=0
 totalStaticIndexSizeKB=0
 totalStaticBloomSizeKB=0
 totalCompactingKVs=0
 currentCompactedKVs=0
 compactionProgressPct=NaN
 {code}
 In a write-heavy REST gateway based production environment, ops team needs to 
 verify whether write counters are getting incremented per region (they do run 
 /status/cluster on each REST server), we can get the same values from 
 *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but 
 some home-grown tools needs to parse the output of /status/cluster and 
 updates the dashboard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214302#comment-13214302
 ] 

stack commented on HBASE-5424:
--

You need this on 0.90 branch too Zhiyuan?  If so, add 0.90.7 as fix version.

(We should also though do as Lars suggests; we're just band-aiding dealing w/ 
the NPE).

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: 5424-v3.patch, HBASE-5424.patch, HBase-5424_v2.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214303#comment-13214303
 ] 

stack commented on HBASE-5437:
--

Scott: It failed compile?

See above console output:

{code}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftMetrics.java:[118,55]
 cannot find symbol
[ERROR] symbol  : variable toString
[ERROR] location: class java.lang.Classcapture#444 of ?
[ERROR] - [Help 1]
{code}


 HRegionThriftServer does not start because of a bug in 
 HbaseHandlerMetricsProxy
 ---

 Key: HBASE-5437
 URL: https://issues.apache.org/jira/browse/HBASE-5437
 Project: HBase
  Issue Type: Bug
  Components: metrics, thrift
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.94.0

 Attachments: HBASE-5437.D1857.1.patch, HBASE-5437.D1887.1.patch


 3.facebook.com,60020,1329865516120: Initialization of RS failed.  Hence 
 aborting RS.
 java.lang.ClassCastException: $Proxy9 cannot be cast to 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Iface
 at 
 org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658)
 at java.lang.Thread.run(Thread.java:662)
 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214309#comment-13214309
 ] 

stack commented on HBASE-5424:
--

I reverted the patch.  Too many new failures in hadoopqa.  Let me retry it.

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: 5424-v3.patch, 5424-v3.patch, HBASE-5424.patch, 
 HBase-5424_v2.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5454) Refuse operations from Admin befor master is initialized

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214314#comment-13214314
 ] 

stack commented on HBASE-5454:
--

Chunhui Want to do that other stuff in a different issue?  This one is nice and 
simple as is if you make the changes suggested I can commit.

 Refuse operations from Admin befor master is initialized
 

 Key: HBASE-5454
 URL: https://issues.apache.org/jira/browse/HBASE-5454
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
 Attachments: hbase-5454.patch


 In our testing environment,
 When master is initializing, we found conflict problems between 
 master#assignAllUserRegions and EnableTable event, causing assigning region 
 throw exception so that master abort itself.
 We think we'd better refuse operations from Admin, such as CreateTable, 
 EnableTable,etc, It could reduce error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-22 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214325#comment-13214325
]

stack commented on HBASE-4991:
--

bq. Shell command needs to be changed as delete_region table_name start_key
end_key

delete_region would go well w/ our current close_region. Do you need
tablename, startkey, endkey? Can't you just pass region name?

ditto for the deleteRegion call (though maybe I'm missing the fact that you are
trying to respect Todd's comments above that we not have region come up out of
the API -- ignore this remark if so).

bq. If start/end key for the specified table is spanned across multiple regions
then it is out of scope of this JIRA (throw error).

So, you can only do one region at a time? Why would it be hard doing multiple
given you are tracking? Or is it that it makes the tracking yet more
complicated?

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-22 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214332#comment-13214332
]

stack commented on HBASE-4991:
--

bq. Do you need tablename, startkey, endkey? Can't you just pass region name?

Again, you may be trying to not reveal region type in API but then
delete_region would be different to close_region which takes a regionname IIRC?

bq. ..but client does not know the tableName for a regionName in our case...

It does not know? I may be missing context. If I am, ignore this comment. If
it did know, could use: 'ListHRegionInfo getOnlineRegions() ' HRIs have
tablename. Could figure it client-side. You can't use the ListHRegion
because we can't serialize HRegion to pass over connection... that call is if
you are running in same context in JSP or in a unit test or something.

bq. Design choice is like HBASE-4213, meaning master create a znode under
zookeeper.znode.parent/delete-region

Fair enough. Will we have a new dir in zk per cluster region operation we want
to do? Can we not exploit primitives added by hbase-4213? Or do we need to
refactor hbase-4213 to get you primitives you need to do this facility? Or is
there nothing in common w/ what hbase-4213 does (there is at least the closing
of a region?)

bq. ...If we are considering delete_region as a tool/util then we can refactor
as a tool/util as like Online/Offline merge code

online merge should have a bunch of overlap w/ this feature? Would be great if
they could share a bunch of code/primitives. As has been suggested, rather
than a /delete-region, instead we'd have a log of intent+log of actions thing
up in zk I suppose. The log of intent would list the steps to be done and then
the log of actions thingy would log how far the operation had gone (I should
read up on the cited accumulo doo-hickey).

bq. We do put all our ZK trackers in zookeeper package and this is how online
schema change HBASE-4213 was implemented.

Thats a bit broken in my opinion. Its wonky having zk have reference out to
other main packages. Not your fault. Should have caught that in review of
hbase-4213.

Good on you Mubarak.

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5460) Add protobuf as M/R dependency jar

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214334#comment-13214334
 ] 

stack commented on HBASE-5460:
--

+1

 Add protobuf as M/R dependency jar
 --

 Key: HBASE-5460
 URL: https://issues.apache.org/jira/browse/HBASE-5460
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5460.txt


 Getting this from M/R jobs (Export for example):
 Error: java.lang.ClassNotFoundException: com.google.protobuf.Message
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:262)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214352#comment-13214352
 ] 

stack commented on HBASE-5455:
--

Good idea.

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Priority: Minor
 Fix For: 0.94.0


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214357#comment-13214357
 ] 

stack commented on HBASE-5434:
--

It fails on hadoopqa too...

{code}
-1 core tests.  The patch failed these unit tests:
  
org.apache.hadoop.hbase.rest.model.TestStorageClusterStatusModel
{code}

Does it fail for you Mubarak?

 [REST] Include more metrics in cluster status request
 -

 Key: HBASE-5434
 URL: https://issues.apache.org/jira/browse/HBASE-5434
 Project: HBase
  Issue Type: Improvement
  Components: metrics, rest
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Mubarak Seyed
Priority: Minor
  Labels: noob
 Fix For: 0.94.0

 Attachments: HBASE-5434.trunk.v1.patch


 /status/cluster shows only
 {code}
 stores=2
 storefiless=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 {code}
 for a region but master web-ui shows
 {code}
 stores=1,
 storefiles=0,
 storefileUncompressedSizeMB=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 readRequestsCount=0
 writeRequestsCount=0
 rootIndexSizeKB=0
 totalStaticIndexSizeKB=0
 totalStaticBloomSizeKB=0
 totalCompactingKVs=0
 currentCompactedKVs=0
 compactionProgressPct=NaN
 {code}
 In a write-heavy REST gateway based production environment, ops team needs to 
 verify whether write counters are getting incremented per region (they do run 
 /status/cluster on each REST server), we can get the same values from 
 *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but 
 some home-grown tools needs to parse the output of /status/cluster and 
 updates the dashboard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214372#comment-13214372
 ] 

stack commented on HBASE-3149:
--

@Nicolas I think I follow.  I opened HBASE-5461.  Let me try it.

bq. Why is this silly? 

Because I was seeing a plethora of small files a problem but given your 
explaination above, I think I grok that its not many small files thats the 
prob; its that w/ the way high min size, our selection was to inclusionary and 
so we end up doing loads of rewriting.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214857#comment-13214857
 ] 

stack commented on HBASE-5437:
--

+1

 HRegionThriftServer does not start because of a bug in 
 HbaseHandlerMetricsProxy
 ---

 Key: HBASE-5437
 URL: https://issues.apache.org/jira/browse/HBASE-5437
 Project: HBase
  Issue Type: Bug
  Components: metrics, thrift
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.94.0

 Attachments: HBASE-5437.D1857.1.patch, HBASE-5437.D1887.1.patch, 
 HBASE-5437.D1887.2.patch


 3.facebook.com,60020,1329865516120: Initialization of RS failed.  Hence 
 aborting RS.
 java.lang.ClassCastException: $Proxy9 cannot be cast to 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Iface
 at 
 org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658)
 at java.lang.Thread.run(Thread.java:662)
 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214868#comment-13214868
]

stack commented on HBASE-5442:
--

@Mikhail Thats the usual set of three that fail on hadoopqa, fyi.

Use builder pattern in StoreFile and HFile
--

Key: HBASE-5442
URL: https://issues.apache.org/jira/browse/HBASE-5442
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Attachments: D1893.1.patch, D1893.2.patch,
HFile-StoreFile-builder-2012-02-22_22_49_00.patch

We have five ways to create an HFile writer, two ways to create a StoreFile
writer, and the sets of parameters keep changing, creating a lot of
confusion, especially when porting patches across branches. The same thing is
happening to HColumnDescriptor. I think we should move to a builder pattern
solution, e.g.
{code:java}
HFileWriter w = HFile.getWriterBuilder(conf, some common args)
.setParameter1(value1)
.setParameter2(value2)
...
.build();
{code}
Each parameter setter being on its own line will make merges/cherry-pick work
properly, we will not have to even mention default parameters again, and we
can eliminate a dozen impossible-to-remember constructors.
This particular JIRA addresses StoreFile and HFile refactoring. For
HColumnDescriptor refactoring see HBASE-5357.

[jira] [Commented] (HBASE-5454) Refuse operations from Admin befor master is initialized

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214870#comment-13214870
 ] 

stack commented on HBASE-5454:
--

So, you want to mash this patch into hbase-5270?  If so, close this one as 
won't fix?

 Refuse operations from Admin befor master is initialized
 

 Key: HBASE-5454
 URL: https://issues.apache.org/jira/browse/HBASE-5454
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
 Attachments: hbase-5454.patch


 In our testing environment,
 When master is initializing, we found conflict problems between 
 master#assignAllUserRegions and EnableTable event, causing assigning region 
 throw exception so that master abort itself.
 We think we'd better refuse operations from Admin, such as CreateTable, 
 EnableTable,etc, It could reduce error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5466) Opening a table also opens the metatable and never closes it.

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215137#comment-13215137
 ] 

stack commented on HBASE-5466:
--

+1 on patch (except for the spacing that is not like the rest of the file)

 Opening a table also opens the metatable and never closes it.
 -

 Key: HBASE-5466
 URL: https://issues.apache.org/jira/browse/HBASE-5466
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.5, 0.92.0
Reporter: Ashley Taylor
 Attachments: MetaScanner_HBASE_5466(2).patch, 
 MetaScanner_HBASE_5466.patch


 Having upgraded to CDH3U3 version of hbase we found we had a zookeeper 
 connection leak, tracking it down we found that closing the connection will 
 only close the zookeeper connection if all calls to get the connection have 
 been closed, there is incCount and decCount in the HConnection class,
 When a table is opened it makes a call to the metascanner class which opens a 
 connection to the meta table, this table never gets closed.
 This caused the count in the HConnection class to never return to zero 
 meaning that the zookeeper connection will not close when we close all the 
 tables or calling
 HConnectionManager.deleteConnection(config, true);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5415) FSTableDescriptors should handle random folders in hbase.root.dir better

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215164#comment-13215164
 ] 

stack commented on HBASE-5415:
--

Whats difference between miscellaneous dirs under hbase.rootdir and an actual 
table directory that is missing its .tableinfo file?

We're changing our API when we remove TEE from public methods?

 FSTableDescriptors should handle random folders in hbase.root.dir better
 

 Key: HBASE-5415
 URL: https://issues.apache.org/jira/browse/HBASE-5415
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5415.patch


 I faked an upgrade on a test cluster using our dev data so I had to distcp 
 the data between the two clusters, but after starting up and doing the 
 migration and whatnot the web UI didn't show any table. The reason was in the 
 master's log:
 {quote}
 org.apache.hadoop.hbase.TableExistsException: No descriptor for 
 _distcp_logs_e0ehek
 at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:164)
 at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:182)
 at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1554)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {quote}
 I don't think we need to show a full stack (just a WARN maybe), this 
 shouldn't kill the request (still see tables in the web UI), and why is that 
 a TableExistsException?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215176#comment-13215176
 ] 

stack commented on HBASE-4365:
--

@Lars You want to put an upper bound on the number of regions?

I think if we do power of three, we'll lose some of the benefit J-D sees above; 
we'll fan out the regions slower.

Do you want to put an upper bound on the number of regions per regionserver for 
a table?  Say, three?  As in, when we get to three regions on a server, just 
scoot the split size up to the maximum.  So, given a power of two, we'd split 
on first flush, then the next split would happen at (2*2*128M) 512M, then 
9*128M=1.2G and thereafter we'd split at the max, say 10G?

Or should we just commit this for now and do more in another patch?

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5349) Automagically tweak global memstore and block cache sizes based on workload

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215220#comment-13215220
 ] 

stack commented on HBASE-5349:
--

Chatting w/ J-D about a phenomenon where we do not use memory when we are 
taking on a bunch of writes w/ a low region count.

The few regions we have grow to their max of 128M or so and then we flush but 
in his case he had gigs of free memory still.  The notion is that we should let 
memstores grow to fill all available space and then flush when they hit the 
low-water global mem mark for the memstore.

The problem then becomes we'll flush lots of massive files and will overwhelm 
compactions.  We'll need a push-back, something like a flush-merge where we 
flush by rewriting an existing store file interleaving the contents of memory 
or some such to slow down the flush but also to make for less compaction to do.

 Automagically tweak global memstore and block cache sizes based on workload
 ---

 Key: HBASE-5349
 URL: https://issues.apache.org/jira/browse/HBASE-5349
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0


 Hypertable does a neat thing where it changes the size given to the CellCache 
 (our MemStores) and Block Cache based on the workload. If you need an image, 
 scroll down at the bottom of this link: 
 http://www.hypertable.com/documentation/architecture/
 That'd be one less thing to configure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5415) FSTableDescriptors should handle random folders in hbase.root.dir better

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215236#comment-13215236
 ] 

stack commented on HBASE-5415:
--

bq. Former's HTD is null, latter gets a FNFE.

I still don't understand how we can tell the different between a misc directory 
in wrong place and a table directory missing its .tableinfo.  Both would look 
the same to the interrogating code I'd think?

bq. Technically no, TEE (and FNFE FWIW) are both IOEs so there's no change 
there. I removed TEE specifically because it isn't thrown anymore.

I mean, if I had client code that had a catch of a TEE, it'd stop working, 
right?  (I'd doubt such a thing exists so I'm not too bad on removing this)

 FSTableDescriptors should handle random folders in hbase.root.dir better
 

 Key: HBASE-5415
 URL: https://issues.apache.org/jira/browse/HBASE-5415
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5415.patch


 I faked an upgrade on a test cluster using our dev data so I had to distcp 
 the data between the two clusters, but after starting up and doing the 
 migration and whatnot the web UI didn't show any table. The reason was in the 
 master's log:
 {quote}
 org.apache.hadoop.hbase.TableExistsException: No descriptor for 
 _distcp_logs_e0ehek
 at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:164)
 at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:182)
 at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1554)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {quote}
 I don't think we need to show a full stack (just a WARN maybe), this 
 shouldn't kill the request (still see tables in the web UI), and why is that 
 a TableExistsException?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215239#comment-13215239
 ] 

stack commented on HBASE-4991:
--

bq. Is it Okay to do the above in another JIRA ?

As a prereq for this issue?  Yes.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215258#comment-13215258
]

stack commented on HBASE-4991:
--

bq. i have tried with getting a ListHRegion, got into serialization issue

Yeah. HRegion is not a Writable

bq. We are using znode just to start the task and update the state only. If we
keep track of intent vs action in same znode, considering the size of data in
znode, we should not exceed 1 MB as ZK admin guide says

Oh, you are talking of writing actual data into zk? I was just talking of
intent, a bare mini language that outlines steps to complete an operation...
something like your enums. I'd think this would be well under 1MB.

Good stuff. Might make sense to work on a bit of a design doc first?

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5415) FSTableDescriptors should handle random folders in hbase.root.dir better

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215259#comment-13215259
 ] 

stack commented on HBASE-5415:
--

+1 on commit

 FSTableDescriptors should handle random folders in hbase.root.dir better
 

 Key: HBASE-5415
 URL: https://issues.apache.org/jira/browse/HBASE-5415
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5415.patch


 I faked an upgrade on a test cluster using our dev data so I had to distcp 
 the data between the two clusters, but after starting up and doing the 
 migration and whatnot the web UI didn't show any table. The reason was in the 
 master's log:
 {quote}
 org.apache.hadoop.hbase.TableExistsException: No descriptor for 
 _distcp_logs_e0ehek
 at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:164)
 at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:182)
 at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1554)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {quote}
 I don't think we need to show a full stack (just a WARN maybe), this 
 shouldn't kill the request (still see tables in the web UI), and why is that 
 a TableExistsException?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215277#comment-13215277
]

stack commented on HBASE-3909:
--

I'm now suggesting we hoist the differences only up into zk. We'd have a
configuration directory under /hbase in zk. It would have znodes whose names
are the config to change. The content of the znode is the new value (and type
I suppose).

Once a znode is added under configuration dir, watchers are triggered and they
update their running Configuration instance.

We do some refactoring in HRegionServers and HMaster so important configs go
back to their Configuration instance at critical junctures such as at split or
checking if should do a compaction or if should flush, rather than read a data
member that was set on Construction (We'd be careful to not do lookup on
Configuration always).

We'd add a configure to the shell that allowed you hoist configs up into zk.

We'd punt on there being a connection between this mechanism and whats in
hbase-*xml. This facility is for 'ephemeral' configuration, for getting you
over a temporary hump, for trying out a setting to see its effect, or to get
you out of a fix; e.g. cluster is up and running but you forgot to set a
critical config. all w/o need of a rolling restart/restart.

Add dynamic config
--

Key: HBASE-3909
URL: https://issues.apache.org/jira/browse/HBASE-3909
Project: HBase
Issue Type: Bug
Reporter: stack
Fix For: 0.94.0

I'm sure this issue exists already, at least as part of the discussion around
making online schema edits possible, but no hard this having its own issue.
Ted started a conversation on this topic up on dev and Todd suggested we
lookd at how Hadoop did it over in HADOOP-7001

[jira] [Commented] (HBASE-5466) Opening a table also opens the metatable and never closes it.

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215279#comment-13215279
 ] 

stack commented on HBASE-5466:
--

@Ted Yes please.

 Opening a table also opens the metatable and never closes it.
 -

 Key: HBASE-5466
 URL: https://issues.apache.org/jira/browse/HBASE-5466
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.5, 0.92.0
Reporter: Ashley Taylor
 Attachments: MetaScanner_HBASE_5466(2).patch, 
 MetaScanner_HBASE_5466(3).patch, MetaScanner_HBASE_5466.patch


 Having upgraded to CDH3U3 version of hbase we found we had a zookeeper 
 connection leak, tracking it down we found that closing the connection will 
 only close the zookeeper connection if all calls to get the connection have 
 been closed, there is incCount and decCount in the HConnection class,
 When a table is opened it makes a call to the metascanner class which opens a 
 connection to the meta table, this table never gets closed.
 This caused the count in the HConnection class to never return to zero 
 meaning that the zookeeper connection will not close when we close all the 
 tables or calling
 HConnectionManager.deleteConnection(config, true);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215284#comment-13215284
 ] 

stack commented on HBASE-3909:
--

Reading over hadoop-7001, Phillip says Not to mention that Configuration 
objects get copied along, so it's hard to make sure that a configuration change 
propagates to all possible children.  I need to survey to make sure callback 
context can change a Configuration instance that is used in all the important 
places we'd want to change.  

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215346#comment-13215346
 ] 

stack commented on HBASE-3909:
--

bq. The assumption is there wouldn't be many such config items to change. We 
should survey and validate this assumption.

You could do hundreds or even put every config. up there if you wanted.  Should 
be fine.

bq. When would these znodes be deleted ?

Not sure.  Good question.  Their insertion would trigger the callback so they'd 
be useless after putting.  Could let them just expire.  Might be good to keep 
them around though so could get an idea of what was changed via zk.  Need to 
think on it.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-02-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215378#comment-13215378
]

stack commented on HBASE-5451:
--

Excellent!

Minor, would do:

if (!head.hasUserInfo()) return;

.. Then you'd save an indent of the whole body of the method.

Seems like ticket should be renamed user (we seem to be creating a user rather
than a ticket?) here -- I like the way you ask user to create passing the
header:

- ticket = header.getUser();
+ ticket = User.create(header);

Is ConnectionContext actually the headers? Should it be called
ConnectionHeader?

What is this -- HBaseCompleteRpcRequestProto? Its 'The complete RPC request
message'. Its the callid and the client request. Is it the complete request
because its missing the header? Should it just be called Request since its
inside a package that makes its provinence clear? I suppose request would be
odd because you then do getRequest on it... hmm.

Why tunnelRequest. Whats that mean?

I like the builder stuff making headers and request over in client.

Fatten doc on the proto file I'd say. Its going to be our spec.

Can these proto classes drop the HBaseRPC prefix? Is the Proto suffix going to
be our convention denoting Proto classes going forward?

Are we doing to repeat the hrpc exception handling carrying Strings for
exceptions from server to client?

Switch RPC call envelope/headers to PBs
---

Key: HBASE-5451
URL: https://issues.apache.org/jira/browse/HBASE-5451
Project: HBase
Issue Type: Sub-task
Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Devaraj Das
Attachments: rpc-proto.patch.1_2

[jira] [Commented] (HBASE-5440) Allow import to optionally use HFileOutputFormat

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215388#comment-13215388
 ] 

stack commented on HBASE-5440:
--

LGTM.  Whats missing is better documentation in the usage for Import.  This new 
option will be under a rock unless its better surfaced.  +1 on commit after 
beefing up usage.  Add some lines under here:

{code}
-System.err.println(Usage: Import tablename inputdir);
+System.err.println(Usage: Import [-D + BULK_OUTPUT_CONF_KEY
++ =/path/for/output] tablename inputdir);
{code}

... going on about what the -D thingy does.

Good stuff.

 Allow import to optionally use HFileOutputFormat
 

 Key: HBASE-5440
 URL: https://issues.apache.org/jira/browse/HBASE-5440
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5440.txt


 importtsv support importing into a life table or to generate HFiles for bulk 
 load.
 import should allow the same.
 Could even consider merging these tools into one (in principle the only 
 difference is the parsing part - although that is maybe for a different jira).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215432#comment-13215432
 ] 

stack commented on HBASE-5166:
--

@Jai Its not you.  Those are known failing tests.  Let me commit.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215431#comment-13215431
]

stack commented on HBASE-4991:
--

bq. Do we need to design the intent part (and steps of an operation) as a
generic framework for all the master-coordinated tasks?

I'd think you could make it work for you but make it so it could be others.
How does Accumulo do it do you know? You might get some ideas over there.

bq. I thought we are only changing the API (with multiple region support) and
focussing more on refactoring with good test/stress-test in this JIRA.

You mean to get this facility into core?

My sense is that you could get this specialized lump into hbase to do this one
facility if lots of tests but my fear is that if it does go in, it'll live
forever as an awkward appendage. Seems like we have an opportunity to add some
base primitives that we can then build this feature on as well as others. Pity
to waste it (understood if you don't want to do the generalized system).

bq. Can we address intent/actions part out of scope of this JIRA?

I'm reluctant to because of the above -- we'll get a specialized lump of code
that will live forever and we'll all be afraid to touch.

Maybe others think different.

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215442#comment-13215442
 ] 

stack commented on HBASE-5317:
--

It can go into 0.92 if you make a version (I see a bunch of failures trying to 
apply trunk patch).  Thanks Gregory.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, 
 HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, 
 HBASE-5317-v6.patch, 
 TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation

2012-02-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215446#comment-13215446
]

stack commented on HBASE-3484:
--

Great stuff Todd.

bq. ...copy-on-write sorted array lists.

Could we do this? We'd allocate a new array everytime we did an insert? An
array would be cheaper space wise and more efficient scanning, etc., I'd
think It'd just be the insert and sort that'd be 'expensive'.

Let me have a go at your suggested microbenchmark.

Replace memstore's ConcurrentSkipListMap with our own implementation

Key: HBASE-3484
URL: https://issues.apache.org/jira/browse/HBASE-3484
Project: HBase
Issue Type: Improvement
Components: performance
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
Attachments: hierarchical-map.txt

By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements
to it for our use case in MemStore:
- add an iterator.replace() method which should allow us to do upsert much
more cheaply
- implement a Set directly without having to do MapKeyValue,KeyValue to
save one reference per entry
It turns out CSLM is in public domain from its development as part of JSR
166, so we should be OK with licenses.

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215796#comment-13215796
 ] 

stack commented on HBASE-5075:
--

bq. Even something as simple as just removing your own znode on failure would 
be sufficient to cover this use case, correct?

Lets do that regardless.  Good idea.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215800#comment-13215800
 ] 

stack commented on HBASE-5351:
--

@Adrian That seems like the way to go.

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215802#comment-13215802
 ] 

stack commented on HBASE-5455:
--

+1 Excellent

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Assignee: Michael Drzal
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5455.diff


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215895#comment-13215895
 ] 

stack commented on HBASE-5451:
--

Ok on the tunnel thing.  Maybe comment it some more (if you haven't already) in 
code.

Yeah on suffix.  We need convention I'd say distingushing the PB classes.

On exception, could do as separate jira.  Here is one that looks like its what 
you need that already exists, if it helps: HBASE-2030

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Attachments: rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1762) Remove concept of ZooKeeper from HConnection interface

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215946#comment-13215946
 ] 

stack commented on HBASE-1762:
--

This is being done as part of HBASE-5399

 Remove concept of ZooKeeper from HConnection interface
 --

 Key: HBASE-1762
 URL: https://issues.apache.org/jira/browse/HBASE-1762
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.0
Reporter: Ken Weiner
Assignee: stack
 Attachments: HBASE-1762.patch


 The concept of ZooKeeper is really an implementation detail and should not be 
 exposed in the {{HConnection}} interface.   Therefore, I suggest removing the 
 {{HConnection.getZooKeeperWrapper()}} method from the interface. 
 I couldn't find any uses of this method within the HBase code base except for 
 in one of the unit tests: {{org.apache.hadoop.hbase.TestZooKeeper}}.  This 
 unit test should be changed to instantiate the implementation of 
 {{HConnection}} directly, allowing it to use the {{getZooKeeperWrapper()}} 
 method.  This requires making 
 {{org.apache.hadoop.hbase.client.HConnectionManager.TableServers}} public.  
 (I actually think TableServers should be moved out into an outer class, but 
 in the spirit of small patches, I'll refrain from suggesting that in this 
 issue).
 I'll attach a patch for:
 # The removal of {{HConnection.getZooKeeperWrapper()}}
 # Change of {{TableServers}} class from private to public
 # Direct instantiation of {{TableServers}} within {{TestZooKeeper}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215945#comment-13215945
 ] 

stack commented on HBASE-5399:
--

Another thought:

Do we have to have the getSharedZookeeperWatcher and 
releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection 
API?  Are these not implementation details? (Or would it be too hard to undo 
them -- you'd have not way of counting zk and master connections?)

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215973#comment-13215973
 ] 

stack commented on HBASE-5075:
--

This issue seems to be like 'HBASE-2342 Consider adding a watchdog node next to 
region server'

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-24 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216019#comment-13216019
]

stack commented on HBASE-4991:
--

bq. I feel some of the recent proposals / requirements are far more complex
than the one

Yeah. It seemed basic back in December.

bq. There wasn't such requirement when Mubarak outlined his plan

Pardon me. I should have noticed the plan but did not. Other priorities. If
I'd seen the plan I'd have blanched I think.

bq. Of course, having generic framework for all the master-coordinated tasks
allows future additions to be concise.

Yep. We'd have tested, proven primitives to build stuff on rather than have to
do it per feature

bq. But I think that should have been outlined clearly in the early stage of
development of a feature.

See above. Pardon me for missing how involved this addition became.

I don't see how plan of ' 01/Feb/12 07:43' lays foundation for a generic
framework. Am I missing something? It seems like its code for this feature
only?

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216066#comment-13216066
 ] 

stack commented on HBASE-5075:
--

Rather than write a new supervisor, why not use something old school like 
http://supervisord.org/  A wrapper script could clear old znode from zk before 
restarting new RS instance?

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216078#comment-13216078
 ] 

stack commented on HBASE-5075:
--

Looking in HRegionServer code, it looks like we delete our znode on the way out 
already.  Someone had your idea already Jesse:

{code}
try {
  deleteMyEphemeralNode();
} catch (KeeperException e) {
  LOG.warn(Failed deleting my ephemeral node, e);
}
{code}

Maybe this is broke?

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216113#comment-13216113
 ] 

stack commented on HBASE-3909:
--

@Jimmy Nice thing about zk is that when config changes all get notification 
(Would need to make it so a new regionserver joining cluster would look into 
the zk /configuration dir to pick up differences).  When its in fs, we'd need 
to poll fs to find changes?

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5350) Fix jamon generated package names

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216147#comment-13216147
 ] 

stack commented on HBASE-5350:
--

I verified UI looks right at least in local mode (could be different up on 
cluster)

 Fix jamon generated package names
 -

 Key: HBASE-5350
 URL: https://issues.apache.org/jira/browse/HBASE-5350
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.94.0

 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch


 Previously, jamon was creating the template files in org.apache.hbase, but 
 it should be org.apache.hadoop.hbase, so it's in line with rest of source 
 files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2375) Revisit compaction configuration parameters

2012-02-24 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216164#comment-13216164
]

stack commented on HBASE-2375:
--

Split early has been committed too.

All that remains of this issue is upping default compaction threshold.

Revisit compaction configuration parameters
---

Key: HBASE-2375
URL: https://issues.apache.org/jira/browse/HBASE-2375
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.20.3
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
Labels: moved_from_0_20_5
Attachments: HBASE-2375-flush-split.patch, HBASE-2375-v8.patch

Currently we will make the decision to split a region when a single StoreFile
in a single family exceeds the maximum region size. This issue is about
changing the decision to split to be based on the aggregate size of all
StoreFiles in a single family (but still not aggregating across families).
This would move a check to split after flushes rather than after compactions.
This issue should also deal with revisiting our default values for some
related configuration parameters.
The motivating factor for this change comes from watching the behavior of
RegionServers during heavy write scenarios.
Today the default behavior goes like this:
- We fill up regions, and as long as you are not under global RS heap
pressure, you will write out 64MB (hbase.hregion.memstore.flush.size)
StoreFiles.
- After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a
compaction on this region.
- Compaction queues notwithstanding, this will create a 192MB file, not
triggering a split based on max region size (hbase.hregion.max.filesize).
- You'll then flush two more 64MB MemStores and hit the compactionThreshold
and trigger a compaction.
- You end up with 192 + 64 + 64 in a single compaction. This will create a
single 320MB and will trigger a split.
- While you are performing the compaction (which now writes out 64MB more
than the split size, so is about 5X slower than the time it takes to do a
single flush), you are still taking on additional writes into MemStore.
- Compaction finishes, decision to split is made, region is closed. The
region now has to flush whichever edits made it to MemStore while the
compaction ran. This flushing, in our tests, is by far the dominating factor
in how long data is unavailable during a split. We measured about 1 second
to do the region closing, master assignment, reopening. Flushing could take
5-6 seconds, during which time the region is unavailable.
- The daughter regions re-open on the same RS. Immediately when the
StoreFiles are opened, a compaction is triggered across all of their
StoreFiles because they contain references. Since we cannot currently split
a split, we need to not hang on to these references for long.
This described behavior is really bad because of how often we have to rewrite
data onto HDFS. Imports are usually just IO bound as the RS waits to flush
and compact. In the above example, the first cell to be inserted into this
region ends up being written to HDFS 4 times (initial flush, first compaction
w/ no split decision, second compaction w/ split decision, third compaction
on daughter region). In addition, we leave a large window where we take on
edits (during the second compaction of 320MB) and then must make the region
unavailable as we flush it.
If we increased the compactionThreshold to be 5 and determined splits based
on aggregate size, the behavior becomes:
- We fill up regions, and as long as you are not under global RS heap
pressure, you will write out 64MB (hbase.hregion.memstore.flush.size)
StoreFiles.
- After each MemStore flush, we calculate the aggregate size of all
StoreFiles. We can also check the compactionThreshold. For the first three
flushes, both would not hit the limit. On the fourth flush, we would see
total aggregate size = 256MB and determine to make a split.
- Decision to split is made, region is closed. This time, the region just
has to flush out whichever edits made it to the MemStore during the
snapshot/flush of the previous MemStore. So this time window has shrunk by
more than 75% as it was the time to write 64MB from memory not 320MB from
aggregating 5 hdfs files. This will greatly reduce the time data is
unavailable during splits.
- The daughter regions re-open on the same RS. Immediately when the
StoreFiles are opened, a compaction is triggered across all of their
StoreFiles because they contain references. This would stay the same.
In this example, we only write a given cell twice (instead of 4 times) while
drastically

[jira] [Commented] (HBASE-5477) Cannot build RPM for hbase-0.92.0

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216295#comment-13216295
 ] 

stack commented on HBASE-5477:
--

This works for you Benjamin?  LGTM. You want to file separate issue for 
hbase-conf-pseudo?

 Cannot build RPM for hbase-0.92.0
 -

 Key: HBASE-5477
 URL: https://issues.apache.org/jira/browse/HBASE-5477
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
 Environment: Operating system: CentOS 6.2
 {code}
 $ java -version
 java version 1.6.0_22
 OpenJDK Runtime Environment (IcedTea6 1.10.6) (rhel-1.43.1.10.6.el6_2-x86_64)
 OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
 {code}
 {code}
 $ mvn -v
 Warning: JAVA_HOME environment variable is not set.
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_22
 Java home: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux version: 2.6.32-220.el6.x86_64 arch: amd64 Family: unix
 {code}
Reporter: Benjamin Lee
 Attachments: build.log, hbase-0.92.0.patch


 Steps to reproduce:
 {code}
 tar xzvf hbase-0.92.0.tar.gz
 cd hbase-0.92.0
 mvn -Dmaven.test.skip.exec=true -P rpm install
 {code}
 Failure output and patch will be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5477) Cannot build RPM for hbase-0.92.0

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216294#comment-13216294
 ] 

stack commented on HBASE-5477:
--

This works for you Benjamin?  LGTM. You want to file separate issue for 
hbase-conf-pseudo?

 Cannot build RPM for hbase-0.92.0
 -

 Key: HBASE-5477
 URL: https://issues.apache.org/jira/browse/HBASE-5477
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
 Environment: Operating system: CentOS 6.2
 {code}
 $ java -version
 java version 1.6.0_22
 OpenJDK Runtime Environment (IcedTea6 1.10.6) (rhel-1.43.1.10.6.el6_2-x86_64)
 OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
 {code}
 {code}
 $ mvn -v
 Warning: JAVA_HOME environment variable is not set.
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_22
 Java home: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux version: 2.6.32-220.el6.x86_64 arch: amd64 Family: unix
 {code}
Reporter: Benjamin Lee
 Attachments: build.log, hbase-0.92.0.patch


 Steps to reproduce:
 {code}
 tar xzvf hbase-0.92.0.tar.gz
 cd hbase-0.92.0
 mvn -Dmaven.test.skip.exec=true -P rpm install
 {code}
 Failure output and patch will be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5364) Fix source files missing licenses in 0.92 and trunk

2012-02-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216299#comment-13216299
 ] 

stack commented on HBASE-5364:
--

I applied Shaneal's addendum to 0.90 branch.  Thanks for the cleanup Shaneal

 Fix source files missing licenses in 0.92 and trunk
 ---

 Key: HBASE-5364
 URL: https://issues.apache.org/jira/browse/HBASE-5364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Elliott Clark
Priority: Blocker
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5364-1.patch, hbase-5364-0.90.patch, 
 hbase-5364-0.92.patch, hbase-5364-v2.patch


 running 'mvn rat:check' shows that a few files have snuck in that do not have 
 proper apache licenses.  Ideally we should fix these before we cut another 
 release/release candidate.
 This is a blocker for 0.94, and probably should be for the other branches as 
 well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable

2012-02-25 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216581#comment-13216581
 ] 

stack commented on HBASE-2462:
--

It likes a bunch of this has made it in but I don't see the standalone 
compactions part nor the simulator.  I'm taking a look at salvaging these 
latter two aspects from this patch and at least making it so we have standalone 
compactions (I want to look at compactions in isolation to see if we can make 
them run faster; we also need to work on making it so we do less of them but 
thats other issues).

 Review compaction heuristic and move compaction code out so standalone and 
 independently testable
 -

 Key: HBASE-2462
 URL: https://issues.apache.org/jira/browse/HBASE-2462
 Project: HBase
  Issue Type: Improvement
  Components: performance
Reporter: stack
Assignee: Jonathan Gray
Priority: Critical
  Labels: moved_from_0_20_5

 Anything that improves our i/o profile makes hbase run smoother.  Over in 
 HBASE-2457, good work has been done already describing the tension between 
 minimizing compactions versus minimizing count of store files.  This issue is 
 about following on from what has been done in 2457 but also, breaking the 
 hard-to-read compaction code out of Store.java out to a standalone class that 
 can be the easier tested (and easily analyzed for its performance 
 characteristics).
 If possible, in the refactor, we'd allow specification of alternate merge 
 sort implementations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

2012-02-25 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216608#comment-13216608
]

stack commented on HBASE-5479:
--

Todd suggests something like a scoring over here Matt:
https://issues.apache.org/jira/browse/HBASE-2457?focusedCommentId=12857705page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12857705

Lets verify that we do indeed do selection at queuing time. Thats my
suspicion. If thats the case, for sure needs fixing. Thanks for filing this
one Matt.

Postpone CompactionSelection to compaction execution time
-

Key: HBASE-5479
URL: https://issues.apache.org/jira/browse/HBASE-5479
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Reporter: Matt Corgan

It can be commonplace for regionservers to develop long compaction queues,
meaning a CompactionRequest may execute hours after it was created. The
CompactionRequest holds a CompactionSelection that was selected at request
time but may no longer be the optimal selection. The CompactionSelection
should be created at compaction execution time rather than compaction request
time.
The current mechanism breaks down during high volume insertion. The
inefficiency is clearest when the inserts are finished. Inserting for 5
hours may build up 50 storefiles and a 40 element compaction queue. When
finished inserting, you would prefer that the next compaction merges all 50
files (or some large subset), but the current system will churn through each
of the 40 compaction requests, the first of which may be hours old. This
ends up re-compacting the same data many times.
The current system is especially inefficient when dealing with time series
data where the data in the storefiles has minimal overlap. With time series
data, there is even less benefit to intermediate merges because most
storefiles can be eliminated based on their key range during a read, even
without bloomfilters. The only goal should be to reduce file count, not to
minimize number of files merged for each read.
There are other aspects to the current queuing mechanism that would need to
be looked at. You would want to avoid having the same Store in the queue
multiple times. And you would want the completion of one compaction to
possibly queue another compaction request for the store.
A alternative architecture to the current style of queues would be to have
each Store (all open in memory) keep a compactionPriority score up to date
after events like flushes, compactions, schema changes, etc. Then you create
a CompactionPriorityComparator implements ComparatorStore and stick all
the Stores into a PriorityQueue (synchronized remove/add from the queue when
the value changes). The async compaction threads would keep pulling off the
head of that queue as long as the head has compactionPriority X.

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216861#comment-13216861
 ] 

stack commented on HBASE-5480:
--

+1  Looks grand Andy.  Reflection is per map invocation?  So, per row?  I 
suppose in scheme of things not too bad.

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical
 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-26 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216863#comment-13216863
 ] 

stack commented on HBASE-5075:
--

@zhiyuan.dai What you think of the idea of using supervisor or any of the other 
babysitting programs instead of writing our own from new?   If you need to have 
hbase regionservers dump out their servername so you know what to kill up in 
zk, that can be done easy enough

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-27 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217264#comment-13217264
]

stack commented on HBASE-5075:
--

bq. Do you means another project instead of writing code into hbase?

Yes sir. Process babysitters is a pretty mature domain w/ a wide variety of
existing programs that have been debugged and are able to do this for you.
What do you think about using one of the existing solutions rather than write
your own?

regionserver crashed and failover
-

Key: HBASE-5075
URL: https://issues.apache.org/jira/browse/HBASE-5075
Project: HBase
Issue Type: Improvement
Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
Fix For: 0.90.5

Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch,
HBase-5075-src.patch

regionserver crashed,it is too long time to notify hmaster.when hmaster know
regionserver's shutdown,it is long time to fetch the hlog's lease.
hbase is a online db, availability is very important.
i have a idea to improve availability, monitor node to check regionserver's
pid.if this pid not exsits,i think the rs down,i will delete the znode,and
force close the hlog file.
so the period maybe 100ms.

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217267#comment-13217267
 ] 

stack commented on HBASE-5399:
--

Ditto w/ zk?  Can't we just add close to the HConnection Interface and it will 
decrement the ref counting?

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217266#comment-13217266
 ] 

stack commented on HBASE-5399:
--

bq. ...but I didn't find an easy way to extend the master proxy to make it 
closeable

What is the issue w/ the above? (I wonder why its hard to do?)

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217293#comment-13217293
 ] 

stack commented on HBASE-5399:
--

bq. For HMasterInterface, I don't know: I need to modify the interface but also 
HBaseRPC.getProxy and then VersionedProtocol and so on, no?

 to add the close?  (I am not following closely but would like to 
understand if possible so throw me a clue or two on what issue is).  Thanks N.

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217311#comment-13217311
 ] 

stack commented on HBASE-5399:
--

bq... If I don't want to do that, I need to add the method in the object 
returned by getProxy. You think it makes sense?

How would that work (I've wanted to add a method to the returned proxy in the 
past).  Would you have returned proxy implement another Interface (That sounds 
hard).  Make the returned Interface implement Closeable?  Or, even, whats wrong 
w/ the close going remote?   Maybe there are resources master-side to clean up 
(if not now, maybe one day?... though yeah, if client doesn't have to make the 
RPC, lets not bother if possible).   Sounds like something to try and figure -- 
if possible (Of course I've no ideas?)

BTW, what you have above for conenction w/ try/finally looks ideal



 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-27 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217347#comment-13217347
]

stack commented on HBASE-5270:
--

Do you think we should check to see if we have already split this server's log
for the case where the server was carrying root and meta?

{code}
+ splitLogIfOnline(currentMetaServer);
{code}

Or will the above call become a noop because we just split it before we
assignedroot?

Is this a 'safe mode' or is it the master 'initializing'? I think 'safe mode'
makes folks think of hdfs. It is a little similar in that master is trying to
make sense of the cluster but initializing might be a better name for this
state.

BTW, I think this is an improvement over previous versions of this patch. Its
easier to reason about. Good stuff Chunhui.

Make a method and put this duplicated code into it and call it from the two
places its repeated:

{code}
+if (!deadNotExpiredServers.isEmpty()) {
+ for (final ServerName server : deadNotExpiredServers) {
+LOG.debug(Removing dead but not expired server: + server
++ from eligible server pool.);
+servers.remove(server);
+ }
+}
{code}

Fix this bit of javadoc '... but not are expired now.'

You don't need this:

I think MasterInSafeModeException becomes MasterInitializingException?

Good stuff Chunhui

Regards Jimmy's comment:

bq. Instead of introducing safe mode, can we add something to the RPC server
and don't allow it to sever traffic before the actual server is ready, for
example, fully initialized?

We have a ServerNotRunningYetException down in the ipc. Its thrown by
HBaseServer if RPC has not started yet. It seems a little related to this
MasterInitializing. We also have a PleaseHoldException. Perhaps the Master
should throw this instead of the MasterInitializing? We'd throw a
PleaseHoldException and the message would be detail that the master is
initializing?

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch,
5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch,
5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch,
hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt

This JIRA continues the effort from HBASE-5179. Starting with Stack's
comments about patches for 0.92 and TRUNK:
Reviewing 0.92v17
isDeadServerInProgress is a new public method in ServerManager but it does
not seem to be used anywhere.
Does isDeadRootServerInProgress need to be public? Ditto for meta version.
This method param names are not right 'definitiveRootServer'; what is meant
by definitive? Do they need this qualifier?
Is there anything in place to stop us expiring a server twice if its carrying
root and meta?
What is difference between asking assignment manager isCarryingRoot and this
variable that is passed in? Should be doc'd at least. Ditto for meta.
I think I've asked for this a few times - onlineServers needs to be
explained... either in javadoc or in comment. This is the param passed into
joinCluster. How does it arise? I think I know but am unsure. God love the
poor noob that comes awandering this code trying to make sense of it all.
It looks like we get the list by trawling zk for regionserver znodes that
have not checked in. Don't we do this operation earlier in master setup? Are
we doing it again here?
Though distributed split log is configured, we will do in master single
process splitting under some conditions with this patch. Its not explained in
code why we would do this. Why do we think master log splitting 'high
priority' when it could very well be slower. Should we only go this route if
distributed splitting is not going on. Do we know if concurrent distributed
log splitting and master splitting works?
Why would we have dead servers in progress here in master startup? Because a
servershutdownhandler fired?
This patch is different to the patch for 0.90. Should go into trunk first
with tests, then 0.92. Should it be in this issue? This issue is really hard
to follow now. Maybe this issue is for 0.90.x and new issue for more work on
this trunk patch?
This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was

[jira] [Commented] (HBASE-5460) Add protobuf as M/R dependency jar

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217537#comment-13217537
 ] 

stack commented on HBASE-5460:
--

We said something about 24 hour window for addendums... else its hard for the 
fellows following behind us to figure what happened... that means you should do 
a new issue.

 Add protobuf as M/R dependency jar
 --

 Key: HBASE-5460
 URL: https://issues.apache.org/jira/browse/HBASE-5460
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5460.txt


 Getting this from M/R jobs (Export for example):
 Error: java.lang.ClassNotFoundException: com.google.protobuf.Message
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:262)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217728#comment-13217728
 ] 

stack commented on HBASE-5074:
--

I see these in the logs when I run the patch; its a little odd because it says 
not using PureJavaCrc32 but will use CRC32 but then prints out stacktrace 
anyways:

{code}
2012-02-27 23:34:20,911 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
region: TestTable,150828,1330380684339.ebb37d5d0e2c1f4a8b111830a46e7cbc.
2012-02-27 23:34:20,914 INFO org.apache.hadoop.hbase.regionserver.Store: time 
to purge deletes set to 0ms in store null
2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: 
org.apache.hadoop.util.PureJavaCrc32 not available.
2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: 
Checksum using java.util.zip.CRC32
2012-02-27 23:34:20,931 WARN org.apache.hadoop.hbase.util.ChecksumType: 
org.apache.hadoop.util.PureJavaCrc32C not available.
java.io.IOException: java.lang.ClassNotFoundException: 
org.apache.hadoop.util.PureJavaCrc32C
at 
org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:65)
at 
org.apache.hadoop.hbase.util.ChecksumType$3.initialize(ChecksumType.java:113)
at 
org.apache.hadoop.hbase.util.ChecksumType.init(ChecksumType.java:148)
at 
org.apache.hadoop.hbase.util.ChecksumType.init(ChecksumType.java:37)
at 
org.apache.hadoop.hbase.util.ChecksumType$3.init(ChecksumType.java:100)
at 
org.apache.hadoop.hbase.util.ChecksumType.clinit(ChecksumType.java:100)
at org.apache.hadoop.hbase.io.hfile.HFile.clinit(HFile.java:163)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1252)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:516)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:606)
at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:375)
at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:370)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.util.PureJavaCrc32C
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.hadoop.hbase.util.ChecksumFactory.getClassByName(ChecksumFactory.java:97)
at 
org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:60)
... 19 more
{code}

I'm not sure on whats happening.  It would seem we're using default CRC32 but 
then I'm not sure how I get the above exception reading code.

Also, not sure if I have this facility turned on. Its on by default but I don't 
see anything in logs saying its on (and I don't have metrics on this cluster, 
nor do I have a good handle on before and after regards whether this feature 
makes a difference).

I caught this in a heap dump:

{code}
IPC Server handler 0 on 7003 daemon prio=10 tid=0x7f4a1410c800 nid=0x24b2 
runnable [0x7f4a20487000]
   java.lang.Thread.State: RUNNABLE
at java.util.zip.CRC32.updateBytes(Native Method)
at java.util.zip.CRC32.update(CRC32.java:45)
at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:240)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
- locked 0x0006fc68e9d8 (a 
org.apache.hadoop.hdfs.BlockReaderLocal)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1457)
- locked 0x0006fc68e9d8 (a 
org.apache.hadoop.hdfs.BlockReaderLocal)
at 
org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:326)
- locked 0x0006fc68e9d8 (a

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-27 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217732#comment-13217732
]

stack commented on HBASE-5270:
--

@Ted Yes. We can keep the prefix and change the rest of the sentence to be
more generic. If Chunhui reuses it here, it'll be an exception the master
throws when they want the client to come back in a while.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-27 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217927#comment-13217927
]

stack commented on HBASE-5074:
--

Hey Ted. Comment was not for you, it was for the patch author.

bq. The exception about org.apache.hadoop.util.PureJavaCrc32C not found should
be normal - it was WARN.

The above makes no sense. You have WARN and 'normal' in the same sentence.

If you look at the log, it says:

1. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType:
org.apache.hadoop.util.PureJavaCrc32 not available.
2. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType:
Checksum using java.util.zip.CRC32
3. It spews a thread dump saying AGAIN that
org.apache.hadoop.util.PureJavaCrc32C not available.

That is going to confuse.

bq. Metrics should be collected on the cluster to see the difference.

Go easy on telling folks what they should do. It tends to piss them off.

support checksums in HBase block cache
--

Key: HBASE-5074
URL: https://issues.apache.org/jira/browse/HBASE-5074
Project: HBase
Issue Type: Improvement
Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch,
D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch,
D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch,
D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch,
D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch,
D1521.9.patch

The current implementation of HDFS stores the data in one block file and the
metadata(checksum) in another block file. This means that every read into the
HBase block cache actually consumes two disk iops, one to the datafile and
one to the checksum file. This is a major problem for scaling HBase, because
HBase is usually bottlenecked on the number of random disk iops that the
storage-hardware offers.

[jira] [Commented] (HBASE-5161) Compaction algorithm should prioritize reference files

2012-02-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217933#comment-13217933
 ] 

stack commented on HBASE-5161:
--

This is not actually a problem, right J-D?  The actual problem is that it takes 
a long time to clear the reference files -- even though they are the first 
things scheduled on region open -- because sometimes we have such a backlog of 
compaction to catch up on (lots of big files).

 Compaction algorithm should prioritize reference files
 --

 Key: HBASE-5161
 URL: https://issues.apache.org/jira/browse/HBASE-5161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1, 0.94.0


 I got myself into a state where my table was un-splittable as long as the 
 insert load was coming in. Emergency flushes because of the low memory 
 barrier don't check the number of store files so it never blocks, to a point 
 where I had in one case 45 store files and the compactions were almost never 
 done on the reference files (had 15 of them, went down by one in 20 minutes). 
 Since you can't split regions with reference files, that region couldn't 
 split and was doomed to just get more store files until the load stopped.
 Marking this as a minor issue, what we really need is a better pushback 
 mechanism but not prioritizing reference files seems wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-28 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218470#comment-13218470
 ] 

stack commented on HBASE-5074:
--

@Dhruba Its good trying for PureJavaCrc32 first.  Get rid of the WARN w/ thread 
dump I'd say especially as is where it comes after reporting we're not going to 
use PureJavaCrc32.  The feature does seem to be on by default but it would be 
nice to know it w/o having to go to ganglia graphs to figure my i/o loading to 
see whether or not this feature is enabled -- going to ganglia would be useless 
anyways in case where I've no history w/ an hbase read load -- so some kind of 
log output might be useful?  Good on you D.

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
 D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, 
 D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, 
 D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, 
 D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, 
 D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5486) Warn message in HTable: Stringify the byte[]

2012-02-28 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218484#comment-13218484
 ] 

stack commented on HBASE-5486:
--

Himanshu The patch build failed because you need to use --no-prefix on the git 
patches you attach here.  Do that the next time.  Let me commit this.

 Warn message in HTable: Stringify the byte[]
 

 Key: HBASE-5486
 URL: https://issues.apache.org/jira/browse/HBASE-5486
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
Priority: Trivial
  Labels: noob
 Attachments: 5486.patch


 The warn message in the method getStartEndKeys() in HTable can be improved by 
 stringifying the byte array for Regions.Qualifier
 Currently, a sample message is like :
 12/01/17 16:36:34 WARN client.HTable: Null [B@552c8fa8 cell in 
 keyvalues={test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:server/1326750341579/Put/vlen=28,
  
 test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:serverstartcode/1326750341579/Put/vlen=8}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5486) Warn message in HTable: Stringify the byte[]

2012-02-28 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218489#comment-13218489
 ] 

stack commented on HBASE-5486:
--

Hmm.. shouldn't this toString be a static itself in HConstants rather than make 
it each time?  Want to have another go at it Himanshu?  Thanks.

 Warn message in HTable: Stringify the byte[]
 

 Key: HBASE-5486
 URL: https://issues.apache.org/jira/browse/HBASE-5486
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
Priority: Trivial
  Labels: noob
 Attachments: 5486.patch


 The warn message in the method getStartEndKeys() in HTable can be improved by 
 stringifying the byte array for Regions.Qualifier
 Currently, a sample message is like :
 12/01/17 16:36:34 WARN client.HTable: Null [B@552c8fa8 cell in 
 keyvalues={test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:server/1326750341579/Put/vlen=28,
  
 test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:serverstartcode/1326750341579/Put/vlen=8}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-28 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218514#comment-13218514
 ] 

stack commented on HBASE-5399:
--

On 1., yeah, the close should close the connection -- a client-side thing

On 2., not so mad about it.

On 3., you obtain the objective it seems but the solution does seem convoluted 
(more indirection in the client makes it yet more obtuse). Put up a patch I'd 
say.  Lets have a look.  SharedMaster is probably not the right name for the 
Interface?  CloseableMaster or MasterConnection and doc that the close applies 
to the closing of the client connection to master only.

Good on you N



 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4324) Single unassigned directory is very slow when there are many unassigned nodes

2012-02-28 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218529#comment-13218529
 ] 

stack commented on HBASE-4324:
--

Yeah, should still be an issue.  Probably better to have it in 0.96, the 
singularity, since will necessitate change in layout up in zk.

 Single unassigned directory is very slow when there are many unassigned nodes
 -

 Key: HBASE-4324
 URL: https://issues.apache.org/jira/browse/HBASE-4324
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.4
Reporter: Todd Lipcon
 Fix For: 0.96.0


 Because we use a single znode for /unassigned, and we re-list it every time 
 its contents change, assignment speed per region is O(number of unassigned 
 regions) rather than O(1). Every time something changes about one unassigned 
 region, the master has to re-list the entire contents of the directory inside 
 of AssignmentManager.nodeChildrenChanged().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-02-28 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218785#comment-13218785
 ] 

stack commented on HBASE-5487:
--

I took a look at FATE over in accumulo.  Its some nice generic primitives for 
running a suite of idempotent operations (even if operation only part 
completes, if its run again, it should clean up and continue).  There is a 
notion of locking on a table (so can stop it transiting I suppose; there are 
read/write locks), a stack for operations (ops are pushed and popped off the 
stack), operations can respond done, failed, or even w/ a new set of operations 
to do first (This basic can be used to step through a number of tasks one after 
the other).  All is persisted up in zk run by the master; if master dies, a new 
master can pick up the half-done task and finish it.  Clients can watch zk to 
see if task is done.  There ain't too much to the fate package; there is fate 
class itself, an admin, a 'store' interface of which there is a zk 
implementation.  We should for sure take inspiration at least from the work 
already done.

Here are the ops they do via fate:

{code}
  fate.seedTransaction(opid, new TraceRepoMaster(new 
CreateTable(c.user, tableName, timeType, options)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
RenameTable(tableId, oldTableName, newTableName)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
CloneTable(c.user, srcTableId, tableName, propertiesToSet, 
propertiesToExclude)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
DeleteTable(tableId)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
ChangeTableState(tableId, TableOperation.ONLINE)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
ChangeTableState(tableId, TableOperation.OFFLINE)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
TableRangeOp(MergeInfo.Operation.MERGE, tableId, startRow, endRow)), 
autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
TableRangeOp(MergeInfo.Operation.DELETE, tableId, startRow, endRow)), 
autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
BulkImport(tableId, dir, failDir, setTime)), autoCleanup);
  fate.seedTransaction(opid, new TraceRepoMaster(new 
CompactRange(tableId, startRow, endRow)), autoCleanup);{code}
{code}

CompactRange is their term for merge.  It takes a key range span, figures the 
tablets involved and runs the compact/merge.  We want that and then something 
to do the remove or regions too?





 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
  Labels: noob

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5488) Fixed OfflineMetaRepair bug

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219312#comment-13219312
 ] 

stack commented on HBASE-5488:
--

+1

 Fixed OfflineMetaRepair bug 
 

 Key: HBASE-5488
 URL: https://issues.apache.org/jira/browse/HBASE-5488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.90.7, 0.92.1

 Attachments: HBASE-5488-branch92.patch, HBASE-5488-trunk.patch, 
 HBASE-5488_branch90.txt


 I want to use OfflineMetaRepair tools and found onbody fix this bugs. I 
 will make a patch.
  12/01/05 23:23:30 ERROR util.HBaseFsck: Bailed out due to:
  java.lang.IllegalArgumentException: Wrong FS: hdfs:// 
  us01-ciqps1-name01.carrieriq.com:9000/hbase/M2M-INTEGRATION-MM_TION-13
  25190318714/0003d2ede27668737e192d8430dbe5d0/.regioninfo,
  expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:352)
 at
  org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
 at
  org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:368)
 at
  org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
 at
  org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:126)
 at
  org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:398)
 at
  org.apache.hadoop.hbase.util.HBaseFsck.loadMetaEntry(HBaseFsck.java:256)
 at
  org.apache.hadoop.hbase.util.HBaseFsck.loadTableInfo(HBaseFsck.java:284)
 at
  org.apache.hadoop.hbase.util.HBaseFsck.rebuildMeta(HBaseFsck.java:402)
 at
  org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair.main(OfflineMetaRe

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5491) Delete the HBaseConfiguration.create for coprocessor.Exec class

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219315#comment-13219315
 ] 

stack commented on HBASE-5491:
--

+1 on patch.  Will add comment that setConf is for testing only on commit.   
Waiting on hadoopqa before committing.

 Delete the HBaseConfiguration.create for coprocessor.Exec class
 ---

 Key: HBASE-5491
 URL: https://issues.apache.org/jira/browse/HBASE-5491
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5491.patch


 Exec class has a field: private Configuration conf = 
 HBaseConfiguration.create()
 Client side generates an Exec instance of the class, each initiated 
 Statistics request by ExecRPCInvoker
 Is so HBaseConfiguration.create for each request needs to call
 When the server side deserialize the Exec Called once 
 HBaseConfiguration.create in,
 HBaseConfiguration.create is a time consuming operation.
 private Configuration conf = HBaseConfiguration.create();
 This code is only useful for testing code 
 (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization),
 other places with the Exec class, pass a Configuration come,
 so no need to conf field a default value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5491) Delete the HBaseConfiguration.create for coprocessor.Exec class

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219316#comment-13219316
 ] 

stack commented on HBASE-5491:
--

Although, one question Honghua, why not remove the setConf and in the test do 
new Exec(HBaseConfiguration.create())?

 Delete the HBaseConfiguration.create for coprocessor.Exec class
 ---

 Key: HBASE-5491
 URL: https://issues.apache.org/jira/browse/HBASE-5491
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5491.patch


 Exec class has a field: private Configuration conf = 
 HBaseConfiguration.create()
 Client side generates an Exec instance of the class, each initiated 
 Statistics request by ExecRPCInvoker
 Is so HBaseConfiguration.create for each request needs to call
 When the server side deserialize the Exec Called once 
 HBaseConfiguration.create in,
 HBaseConfiguration.create is a time consuming operation.
 private Configuration conf = HBaseConfiguration.create();
 This code is only useful for testing code 
 (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization),
 other places with the Exec class, pass a Configuration come,
 so no need to conf field a default value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1551 matches

Mail list logo