[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293602#comment-13293602
 ] 

Hadoop QA commented on HBASE-6134:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12531772/HBASE-6134v4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 6 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
  org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2147//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2147//console

This message is automatically generated.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-12 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294053#comment-13294053
 ] 

Zhihong Ted Yu commented on HBASE-6134:
---

TestServerCustomProtocol passes locally.
Will integrate later if there is no objection.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-11 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292743#comment-13292743
 ] 

chunhui shen commented on HBASE-6134:
-

Create new review request 

https://review.cloudera.org/r/2137/

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293345#comment-13293345
 ] 

Hadoop QA commented on HBASE-6134:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12531760/HBASE-6134v3-92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2142//console

This message is automatically generated.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-05 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289178#comment-13289178
 ] 

Anoop Sam John commented on HBASE-6134:
---

Yes agree with Chunhui
One buffer per HLogSplitter and one HLogSplitter instance per one log file that 
the RS is splitting.  Correct me if I am wrong Chunhui

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-04 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289067#comment-13289067
 ] 

chunhui shen commented on HBASE-6134:
-

In the review board ,Prakash Khemani saied
bq.But the old code had a serious drawback – it would read the entire log file 
in memory before writing it out. Also the old code assumed that multiple log 
files were being split at the same time, but that is no longer true with 
distributed log splitting.
Whatever approach we take, I don’t think we should re-introduce buffering of 
the entire log file in memory.

Since we set maximal buffer size 128MB, I don't know what effect would cause if 
using buffer. Anyway, splitting log happens infrequently.

what others consider?



 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289136#comment-13289136
 ] 

stack commented on HBASE-6134:
--

@Chunhui You are saying we only buffer 128M worth of the WAL, not the entire 
log file as Prakash says (You sure?  Do we not read it all up into memory to 
bucket it by region and then after that is done, then we start writing out the 
recovered.edits file?  I could be wrong -- going by memory).  If we are only 
using a 128M buffer flushing to the recovered.edits file as we go, then that'd 
be fine I'd say.   Are there other improvements to be had given the 
circumstance is now one reader only writing multiple files?  Good on you.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-04 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289155#comment-13289155
 ] 

Anoop Sam John commented on HBASE-6134:
---

From the patch and current HLogSplitter code I can see that 128M (def value) 
buffer only is used. From the WAL data is read into this buffer and parallel 
writers get data from buffer and write to appropriate recovered.edits file 
corresponding to region. Reading into the buffer and write by the writers done 
parallely like producer consumer.

This way split happens when RS doing split for one file. If the same RS doing 
split of another file another HLogSplitter instance will be created for that 
which will contain another buffer. 

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289162#comment-13289162
 ] 

stack commented on HBASE-6134:
--

@Anoop So we flush to recovered.edits how often?  If each recovered.edits has a 
buffer of 16megs and we have 1000 recovered.edits that we have not yet flushed, 
can that happen?

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-04 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289166#comment-13289166
 ] 

chunhui shen commented on HBASE-6134:
-

@stack
It won't happen.
bq.So we flush to recovered.edits how often?
Writers are flushing data all along

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-06-04 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289168#comment-13289168
 ] 

chunhui shen commented on HBASE-6134:
-

Writers#doRun
{code}

  LOG.debug(Writer thread  + this + : starting);
  while (true) {
RegionEntryBuffer buffer = entryBuffers.getChunkToWrite();
if (buffer == null) {
  // No data currently available, wait on some more to show up
  synchronized (dataAvailable) {
if (shouldStop) return;
try {
  dataAvailable.wait(1000);
} catch (InterruptedException ie) {
  if (!shouldStop) {
throw new RuntimeException(ie);
  }
}
  }
  continue;
}

assert buffer != null;
try {
  writeBuffer(buffer);
} finally {
  entryBuffers.doneWriting(buffer);
}
  }
{code}

entryBuffers.getChunkToWrite() won't return null as long as there is data in 
the buffer

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286385#comment-13286385
 ] 

Hadoop QA commented on HBASE-6134:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530341/HBASE-6134v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.regionserver.wal.TestHLog

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2074//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2074//console

This message is automatically generated.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-31 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286638#comment-13286638
 ] 

Kannan Muthukkaruppan commented on HBASE-6134:
--

The SequenceFile writer does its own buffering. And so, my guess is that, at 
close time, the flushing of these buffers for every region's recovered.edits 
(SequenceFile) file is what is causing the close to be slow because it is 
serialized on these flushes and then DN reporting back to NN that it has 
received the blocks. Parallelizing the close in threads makes sense.

Good catch Chunhui! May I request you to update the description for the JIRA 
with a summary of what the fix does, so it is useful for future readers of this 
JIRA.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286691#comment-13286691
 ] 

Zhihong Yu commented on HBASE-6134:
---

From 
https://builds.apache.org/job/PreCommit-HBASE-Build/2074//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/:
{code}
java.net.BindException: Problem binding to localhost/127.0.0.1:55283 : Address 
already in use
{code}

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285498#comment-13285498
 ] 

chunhui shen commented on HBASE-6134:
-

Review Board 
https://reviews.apache.org/r/5294/

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285502#comment-13285502
 ] 

ramkrishna.s.vasudevan commented on HBASE-6134:
---

@Chunhui
Nice work.
bq.Also, we found closing writer is quite slow
Any reason why it was slow?

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285511#comment-13285511
 ] 

chunhui shen commented on HBASE-6134:
-

@ram
Through the log, we found writer.close() took about 30ms~50ms.

So, it will took about 20s if there are 400 regions in one hlog file.


 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285513#comment-13285513
 ] 

chunhui shen commented on HBASE-6134:
-

@ram
Could you take a look at your environment, hom much time writer.close() took?

Through the log:
LOG.info(Closed path  + wap.p +  (wrote  + wap.editsWritten
+  edits in  + (wap.nanosSpent / 1000 / 1000) + ms)); 

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285515#comment-13285515
 ] 

ramkrishna.s.vasudevan commented on HBASE-6134:
---

Ok.  I will check that but may not be today. :)

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285623#comment-13285623
 ] 

Hadoop QA commented on HBASE-6134:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530177/HBASE-6134.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
  org.apache.hadoop.hbase.regionserver.wal.TestWALReplay

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2051//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2051//console

This message is automatically generated.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285891#comment-13285891
 ] 

Zhihong Yu commented on HBASE-6134:
---

Test failed with NPE at:
{code}
for (FileStatus fileStatus : listStatus1) {
{code}
Checking for null wouldn't help either:
{code}
Failed tests:   
testSequentialEditLogSeqNum(org.apache.hadoop.hbase.regionserver.wal.TestWALReplay):
 The sequence number of the recoverd.edits and the current edit seq should be 
same expected:17 but was:0
{code}

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286313#comment-13286313
 ] 

Hadoop QA commented on HBASE-6134:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530321/HBASE-6134v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2070//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2070//console

This message is automatically generated.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286320#comment-13286320
 ] 

stack commented on HBASE-6134:
--

I added some comments over on rb on latest version of patch.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira