[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216319#comment-13216319 ] Hudson commented on HBASE-5166: --- Integrated in HBase-TRUNK-security #122 (See [https://builds.apache.org/job/HBase-TRUNK-security/122/]) HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Fix For: 0.94.0 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216333#comment-13216333 ] Hudson commented on HBASE-5166: --- Integrated in HBase-TRUNK #2669 (See [https://builds.apache.org/job/HBase-TRUNK/2669/]) HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Fix For: 0.94.0 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214912#comment-13214912 ] Hadoop QA commented on HBASE-5166: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515764/5166-v9.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -134 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//console This message is automatically generated. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215428#comment-13215428 ] Jai Kumar Singh commented on HBASE-5166: @stack,ted: any idea why its failing these tests ? MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215432#comment-13215432 ] stack commented on HBASE-5166: -- @Jai Its not you. Those are known failing tests. Let me commit. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215443#comment-13215443 ] Jai Kumar Singh commented on HBASE-5166: thanks stack, ted ;-) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Fix For: 0.94.0 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213793#comment-13213793 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5268 --- Quite a few white spaces need to be removed. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11536 Should read 'MultithreadedTableMapper instances' /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11508 Leave a space between while and ( Another space between ) and { /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11537 Can we give better progress information here ? /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11535 Long line, please wrap to 80 chars. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11534 This if block can be an else to the if block above. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11533 Please remove white space. - Ted On 2012-02-22 07:20:13, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-22 07:20:13) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214284#comment-13214284 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ --- (Updated 2012-02-23 04:17:08.702062) Review request for hbase, Ted Yu and Michael Stack. Changes --- changes as suggested in review Summary --- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing --- Thanks, Jai MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214283#comment-13214283 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- bq. On 2012-02-22 17:53:12, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 114 bq. https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line114 bq. bq. Should read 'MultithreadedTableMapper instances' done! bq. On 2012-02-22 17:53:12, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 155 bq. https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line155 bq. bq. Can we give better progress information here ? I am not sure how to do it. It would be possible if I can access underlying RecorderReaader/Writer passed to jobContext and simply calling there getProgress. Could anybody help me here ? bq. On 2012-02-22 17:53:12, Ted Yu wrote: bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java, line 223 bq. https://reviews.apache.org/r/3995/diff/2/?file=78620#file78620line223 bq. bq. This if block can be an else to the if block above. done - Jai --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5268 --- On 2012-02-23 04:17:08, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-23 04:17:08) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214288#comment-13214288 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ --- (Updated 2012-02-23 04:22:51.078969) Review request for hbase, Ted Yu and Michael Stack. Changes --- White spaces remove Summary --- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing --- Thanks, Jai MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214291#comment-13214291 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 --- Ship it! This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. - Michael On 2012-02-23 04:22:51, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-23 04:22:51) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214338#comment-13214338 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- bq. On 2012-02-23 04:32:03, Michael Stack wrote: bq. This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166. - Jai --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 --- On 2012-02-23 04:22:51, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-23 04:22:51) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214342#comment-13214342 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- bq. On 2012-02-23 04:32:03, Michael Stack wrote: bq. This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. bq. bq. Jai Singh wrote: bq. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166. So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 --- On 2012-02-23 04:22:51, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-23 04:22:51) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214351#comment-13214351 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- bq. On 2012-02-23 04:32:03, Michael Stack wrote: bq. This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. bq. bq. Jai Singh wrote: bq. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166. bq. bq. Michael Stack wrote: bq. So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it. Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 --- On 2012-02-23 04:22:51, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-23 04:22:51) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214361#comment-13214361 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- bq. On 2012-02-23 04:32:03, Michael Stack wrote: bq. This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. bq. bq. Jai Singh wrote: bq. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166. bq. bq. Michael Stack wrote: bq. So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it. bq. bq. Michael Stack wrote: bq. Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai. Yes, It works great with web crawling scenario. MultiThreadedTableMapper for [N/W] IO bound jobs Updated the patch on jira. Thanks - Jai --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 --- On 2012-02-23 04:22:51, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-23 04:22:51) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214397#comment-13214397 ] Hadoop QA commented on HBASE-5166: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515712/0008-HBASE-5166-Added-MultithreadedTableMapper.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -134 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplicationPeer org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//console This message is automatically generated. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212499#comment-13212499 ] Jai Kumar Singh commented on HBASE-5166: @Zhihong Yu, 1) Apache License was earlier there but I removed that become stack suggested so. Anyway, I'd put it back. 2) I've added Thread.sleep(1000). I am not sure whether we want to limit the wait duration, wouldn't that depend on kind of job we are running ? 3) I've modified the test case of TableMapper in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java Firstly, I was going to make a new testcase file for MultithreadedTableMapper but it does not make sense in doing so, because that would be too much code repetition. So, I added a numOfThreads argument in TestTableMapReduce's runTestOnTable function and called the function twice. Check patch for more details. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212555#comment-13212555 ] Jai Kumar Singh commented on HBASE-5166: submitted a new patch against current trunk on svn. Thanks MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212758#comment-13212758 ] Hadoop QA commented on HBASE-5166: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515348/0006-HBASE-5166-Added-MultithreadedTableMapper.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -134 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/998//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/998//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/998//console This message is automatically generated. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213139#comment-13213139 ] Zhihong Yu commented on HBASE-5166: --- @Jai: {code} + * Copyright 2007 The Apache Software Foundation {code} Year is not needed in license header. Same here: {code} + * Copyright 2009 The Apache Software Foundation {code} {code} + public void testAddDependencyJars() throws Exception { {code} The above doesn't carry @Test annotation. If it is not needed for this JIRA, please remove it. {code} + public static final String MAPPER_CLASS = hbase.mapreduce.multithreadedrunner.class; {code} I think the name of config parameter should be changed to 'multithreadedmapper.class' Same for NUMBER_OF_THREADS {code} + private class SubMapRecordReader extends RecordReaderImmutableBytesWritable, Result { {code} Why do we need the Sub prefix above ? Putting the patch on https://reviews.apache.org would make review process smooth. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213238#comment-13213238 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ --- Review request for Michael Stack. Summary --- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. This addresses bug HBASE-5166. https://issues.apache.org/jira/browse/HBASE-5166 Diffs - /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing --- Thanks, Jai MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213240#comment-13213240 ] Jai Kumar Singh commented on HBASE-5166: @Zhihong Yu: submitted the patch for review with the suggested changes. For the sub prefix, I've taken this from hadoop and following the same. Reason why we are calling it SubMapRecordReader/Writer because it is intermediate RecordReader/Writer for Mapper Threads and It eventually uses RecordReader/Writer passed to MapReduce Job to do actual read/write. Thanks, PS: I tried adding Zhihong in the reviewer list on the review page but somehow RB was failing, So I added stack as reviewer. Please do review. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213243#comment-13213243 ] Zhihong Yu commented on HBASE-5166: --- My recommendation of using review board is to leave Bugs field empty. Otherwise large amount of post-back from review board would appear in the JIRA. You can specify hbase in Groups field. My user name is tedyu. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213328#comment-13213328 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5266 --- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11506 hbase.mapreduce. prefix should be kept. Would hbase.mapreduce.multithreadedmapper.class be a good name ? - Ted On 2012-02-22 03:22:25, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-22 03:22:25) bq. bq. bq. Review request for Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. This addresses bug HBASE-5166. bq. https://issues.apache.org/jira/browse/HBASE-5166 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213367#comment-13213367 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ --- (Updated 2012-02-22 06:00:23.473596) Review request for hbase and Michael Stack. Summary --- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. This addresses bug HBASE-5166. https://issues.apache.org/jira/browse/HBASE-5166 Diffs - /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing --- Thanks, Jai MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213417#comment-13213417 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ --- (Updated 2012-02-22 07:18:48.273758) Review request for hbase and Michael Stack. Changes --- Removing bugid HBASE-5166 Summary --- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs - /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing --- Thanks, Jai MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213418#comment-13213418 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ --- (Updated 2012-02-22 07:20:13.121177) Review request for hbase, Ted Yu and Michael Stack. Summary --- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs - /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing --- Thanks, Jai MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213421#comment-13213421 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- bq. On 2012-02-22 05:26:10, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 64 bq. https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line64 bq. bq. hbase.mapreduce. prefix should be kept. bq. Would hbase.mapreduce.multithreadedmapper.class be a good name ? Okay! I guess than it should be hbase.mapreduce.multithreadedtablemapper. public static final String NUMBER_OF_THREADS = hbase.mapreduce.multithreadedtablemapper.threads; public static final String MAPPER_CLASS = hbase.mapreduce.multithreadedtablemapper.mapclass; - Jai --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5266 --- On 2012-02-22 07:20:13, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-22 07:20:13) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201433#comment-13201433 ] Jai Kumar Singh commented on HBASE-5166: Any comments ?? MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201439#comment-13201439 ] Zhihong Yu commented on HBASE-5166: --- MultithreadedTableMapper misses Apache license {code} +while(!executor.isTerminated()){ + // wait till all the threads are done +} {code} We should put sleep() in the above loop and possibly limit the total duration of wait. A new unit test should be added for MultithreadedTableMapper. Please look at tests that use TableMapper. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187094#comment-13187094 ] Jai Kumar Singh commented on HBASE-5166: Hi stack, Thanks for the comment. I've modified the patch accordingly. Added Executors.newFixedThreadPool(numberOfThreads) for executor part. -- JK MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185963#comment-13185963 ] stack commented on HBASE-5166: -- bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Looks grand to me (as does the network/io-bound justification in your usecase). Would be a nice contrib. I'd like it so I can use it putting up load on hbase; currently have to run a ridiculous amount of concurrent mappers putting up a load using a tool like PerformanceEvaluation which runs a single client doing serial load per map task. A few comments on the patch. No need of these lines: {code} + * Copyright 2007 The Apache Software Foundation {code} In our code base, we use two spaces for tabs (no hard tabs you have in your file). Fix the name of this config: {code} + getInt(mapred.map.multithreadedrunner.threads, 10); {code} Ditto for the setter. You don't want to use an executor and something like guava's utility creating the executor running the threads? (See hbase code base for examples) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira