[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216319#comment-13216319
 ] 

Hudson commented on HBASE-5166:
---

Integrated in HBase-TRUNK-security #122 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/122/])
HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in 
hadoop (Revision 1293098)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java


 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Fix For: 0.94.0

 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216333#comment-13216333
 ] 

Hudson commented on HBASE-5166:
---

Integrated in HBase-TRUNK #2669 (See 
[https://builds.apache.org/job/HBase-TRUNK/2669/])
HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in 
hadoop (Revision 1293098)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java


 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Fix For: 0.94.0

 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214912#comment-13214912
 ] 

Hadoop QA commented on HBASE-5166:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515764/5166-v9.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -134 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 153 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1024//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1024//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1024//console

This message is automatically generated.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-23 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215428#comment-13215428
 ] 

Jai Kumar Singh commented on HBASE-5166:


@stack,ted: any idea why its failing these tests ?

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215432#comment-13215432
 ] 

stack commented on HBASE-5166:
--

@Jai Its not you.  Those are known failing tests.  Let me commit.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-23 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215443#comment-13215443
 ] 

Jai Kumar Singh commented on HBASE-5166:


thanks stack, ted ;-) 

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Fix For: 0.94.0

 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213793#comment-13213793
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5268
---


Quite a few white spaces need to be removed.


/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11536

Should read 'MultithreadedTableMapper instances'



/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11508

Leave a space between while and (
Another space between ) and {



/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11537

Can we give better progress information here ?



/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11535

Long line, please wrap to 80 chars.



/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11534

This if block can be an else to the if block above.



/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11533

Please remove white space.


- Ted


On 2012-02-22 07:20:13, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-22 07:20:13)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214284#comment-13214284
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
---

(Updated 2012-02-23 04:17:08.702062)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

changes as suggested in review


Summary
---

There is no MultiThreadedTableMapper in hbase currently just like we have a 
MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content 
(urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite 
slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for 
these kind of usecases ?.


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
---


Thanks,

Jai



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214283#comment-13214283
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--



bq.  On 2012-02-22 17:53:12, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, 
line 114
bq.   https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line114
bq.  
bq.   Should read 'MultithreadedTableMapper instances'

done!


bq.  On 2012-02-22 17:53:12, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, 
line 155
bq.   https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line155
bq.  
bq.   Can we give better progress information here ?

I am not sure how to do it. It would be possible if I can access underlying 
RecorderReaader/Writer passed to jobContext and simply calling there 
getProgress. Could anybody help me here ?  


bq.  On 2012-02-22 17:53:12, Ted Yu wrote:
bq.   
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java,
 line 223
bq.   https://reviews.apache.org/r/3995/diff/2/?file=78620#file78620line223
bq.  
bq.   This if block can be an else to the if block above.

done


- Jai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5268
---


On 2012-02-23 04:17:08, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-23 04:17:08)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214288#comment-13214288
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
---

(Updated 2012-02-23 04:22:51.078969)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

White spaces remove


Summary
---

There is no MultiThreadedTableMapper in hbase currently just like we have a 
MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content 
(urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite 
slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for 
these kind of usecases ?.


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
---


Thanks,

Jai



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214291#comment-13214291
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
---

Ship it!


This looks great.  Does it work?  Have you tried it?  +1 on commit if it works. 
 Would be nice in things like PE putting up more load.

- Michael


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214338#comment-13214338
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.   This looks great.  Does it work?  Have you tried it?  +1 on commit if it 
works.  Would be nice in things like PE putting up more load.

This works fine. I've tested it in the usecase  I mentioned on jira HBASE-5166.


- Jai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
---


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214342#comment-13214342
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.   This looks great.  Does it work?  Have you tried it?  +1 on commit if it 
works.  Would be nice in things like PE putting up more load.
bq.  
bq.  Jai Singh wrote:
bq.  This works fine. I've tested it in the usecase  I mentioned on jira 
HBASE-5166.

So works nicely for your crawling then?  Mind writing a sweet release note for 
this?  I'll go commit it.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
---


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214351#comment-13214351
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.   This looks great.  Does it work?  Have you tried it?  +1 on commit if it 
works.  Would be nice in things like PE putting up more load.
bq.  
bq.  Jai Singh wrote:
bq.  This works fine. I've tested it in the usecase  I mentioned on jira 
HBASE-5166.
bq.  
bq.  Michael Stack wrote:
bq.  So works nicely for your crawling then?  Mind writing a sweet release 
note for this?  I'll go commit it.

Oh, mind uploading the final version of the patch to the issue itself then we 
can run hadoopqa on the patch and make sure it plays well w/ rest of hbase 
(should be fine given its standalone).  Thanks Jai.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
---


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214361#comment-13214361
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.   This looks great.  Does it work?  Have you tried it?  +1 on commit if it 
works.  Would be nice in things like PE putting up more load.
bq.  
bq.  Jai Singh wrote:
bq.  This works fine. I've tested it in the usecase  I mentioned on jira 
HBASE-5166.
bq.  
bq.  Michael Stack wrote:
bq.  So works nicely for your crawling then?  Mind writing a sweet release 
note for this?  I'll go commit it.
bq.  
bq.  Michael Stack wrote:
bq.  Oh, mind uploading the final version of the patch to the issue itself 
then we can run hadoopqa on the patch and make sure it plays well w/ rest of 
hbase (should be fine given its standalone).  Thanks Jai.

Yes, It works great with web crawling scenario. 

MultiThreadedTableMapper for [N/W] IO bound jobs

Updated the patch on jira.

Thanks


- Jai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
---


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214397#comment-13214397
 ] 

Hadoop QA commented on HBASE-5166:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12515712/0008-HBASE-5166-Added-MultithreadedTableMapper.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -134 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 153 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplicationPeer
  org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.TestDrainingServer
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1020//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1020//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1020//console

This message is automatically generated.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0008-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212499#comment-13212499
 ] 

Jai Kumar Singh commented on HBASE-5166:


@Zhihong Yu, 1) Apache License was earlier there but I removed that become 
stack suggested so. Anyway, I'd put it back. 
2) I've added Thread.sleep(1000). I am not sure whether we want to limit the 
wait duration, wouldn't that depend on kind of job we are running ?
3) I've modified the test case of TableMapper in 
src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java 
Firstly, I was going to make a new testcase file for MultithreadedTableMapper 
but it does not make sense in doing so, because that would be too much code 
repetition.
So, I added a numOfThreads argument in TestTableMapReduce's runTestOnTable 
function and called the function twice. Check patch for more details.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212555#comment-13212555
 ] 

Jai Kumar Singh commented on HBASE-5166:


submitted a new patch against current trunk on svn. 

Thanks

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212758#comment-13212758
 ] 

Hadoop QA commented on HBASE-5166:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12515348/0006-HBASE-5166-Added-MultithreadedTableMapper.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -134 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 160 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestAtomicOperation
  org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/998//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/998//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/998//console

This message is automatically generated.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213139#comment-13213139
 ] 

Zhihong Yu commented on HBASE-5166:
---

@Jai:
{code}
+ * Copyright 2007 The Apache Software Foundation
{code}
Year is not needed in license header. Same here:
{code}
+ * Copyright 2009 The Apache Software Foundation
{code}
{code}
+  public void testAddDependencyJars() throws Exception {
{code}
The above doesn't carry @Test annotation. If it is not needed for this JIRA, 
please remove it.
{code}
+  public static final String MAPPER_CLASS = 
hbase.mapreduce.multithreadedrunner.class;
{code}
I think the name of config parameter should be changed to 
'multithreadedmapper.class'
Same for NUMBER_OF_THREADS
{code}
+  private class SubMapRecordReader extends 
RecordReaderImmutableBytesWritable, Result {
{code}
Why do we need the Sub prefix above ?

Putting the patch on https://reviews.apache.org would make review process 
smooth.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213238#comment-13213238
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
---

Review request for Michael Stack.


Summary
---

There is no MultiThreadedTableMapper in hbase currently just like we have a 
MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content 
(urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite 
slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for 
these kind of usecases ?.


This addresses bug HBASE-5166.
https://issues.apache.org/jira/browse/HBASE-5166


Diffs
-

  
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
---


Thanks,

Jai



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213240#comment-13213240
 ] 

Jai Kumar Singh commented on HBASE-5166:


@Zhihong Yu: submitted the patch for review with the suggested changes. 
For the sub prefix, I've taken this from hadoop and following the same. Reason 
why we are calling it SubMapRecordReader/Writer because it is intermediate 
RecordReader/Writer for Mapper Threads and It eventually uses 
RecordReader/Writer passed to MapReduce Job to do actual read/write. 

Thanks,

PS: I tried adding Zhihong in the reviewer list on the review page but 
somehow RB was failing, So I added stack as reviewer. Please do review. 


 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213243#comment-13213243
 ] 

Zhihong Yu commented on HBASE-5166:
---

My recommendation of using review board is to leave Bugs field empty. Otherwise 
large amount of post-back from review board would appear in the JIRA.
You can specify hbase in Groups field.

My user name is tedyu.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213328#comment-13213328
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5266
---



/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
https://reviews.apache.org/r/3995/#comment11506

hbase.mapreduce. prefix should be kept.
Would hbase.mapreduce.multithreadedmapper.class be a good name ?


- Ted


On 2012-02-22 03:22:25, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-22 03:22:25)
bq.  
bq.  
bq.  Review request for Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  This addresses bug HBASE-5166.
bq.  https://issues.apache.org/jira/browse/HBASE-5166
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213367#comment-13213367
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
---

(Updated 2012-02-22 06:00:23.473596)


Review request for hbase and Michael Stack.


Summary
---

There is no MultiThreadedTableMapper in hbase currently just like we have a 
MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content 
(urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite 
slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for 
these kind of usecases ?.


This addresses bug HBASE-5166.
https://issues.apache.org/jira/browse/HBASE-5166


Diffs
-

  
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
---


Thanks,

Jai



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213417#comment-13213417
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
---

(Updated 2012-02-22 07:18:48.273758)


Review request for hbase and Michael Stack.


Changes
---

Removing bugid HBASE-5166


Summary
---

There is no MultiThreadedTableMapper in hbase currently just like we have a 
MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content 
(urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite 
slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for 
these kind of usecases ?.


Diffs
-

  
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
---


Thanks,

Jai



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213418#comment-13213418
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
---

(Updated 2012-02-22 07:20:13.121177)


Review request for hbase, Ted Yu and Michael Stack.


Summary
---

There is no MultiThreadedTableMapper in hbase currently just like we have a 
MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content 
(urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite 
slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for 
these kind of usecases ?.


Diffs
-

  
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
---


Thanks,

Jai



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213421#comment-13213421
 ] 

jirapos...@reviews.apache.org commented on HBASE-5166:
--



bq.  On 2012-02-22 05:26:10, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, 
line 64
bq.   https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line64
bq.  
bq.   hbase.mapreduce. prefix should be kept.
bq.   Would hbase.mapreduce.multithreadedmapper.class be a good name ?

Okay!
I guess than it should be hbase.mapreduce.multithreadedtablemapper.

  public static final String NUMBER_OF_THREADS = 
hbase.mapreduce.multithreadedtablemapper.threads;
  public static final String MAPPER_CLASS = 
hbase.mapreduce.multithreadedtablemapper.mapclass;
  


- Jai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5266
---


On 2012-02-22 07:20:13, Jai Singh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  ---
bq.  
bq.  (Updated 2012-02-22 07:20:13)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have 
a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the 
content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is 
quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 
 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201433#comment-13201433
 ] 

Jai Kumar Singh commented on HBASE-5166:


Any comments ??

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201439#comment-13201439
 ] 

Zhihong Yu commented on HBASE-5166:
---

MultithreadedTableMapper misses Apache license

{code}
+while(!executor.isTerminated()){
+  // wait till all the threads are done
+}
{code}
We should put sleep() in the above loop and possibly limit the total duration 
of wait.

A new unit test should be added for MultithreadedTableMapper.
Please look at tests that use TableMapper.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-01-16 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187094#comment-13187094
 ] 

Jai Kumar Singh commented on HBASE-5166:


Hi stack,  
   Thanks for the comment. I've modified the patch accordingly.
   Added Executors.newFixedThreadPool(numberOfThreads) for executor part.

-- JK 


 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-01-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185963#comment-13185963
 ] 

stack commented on HBASE-5166:
--

bq. Moreover, I want to know whether It would be a good/bad idea to use HBase 
for these kind of usecases ?.

Looks grand to me (as does the network/io-bound justification in your usecase). 
 Would be a nice contrib.   I'd like it so I can use it putting up load on 
hbase; currently have to run a ridiculous amount of concurrent mappers putting 
up a load using a tool like PerformanceEvaluation which runs a single client 
doing serial load per map task.

A few comments on the patch.

No need of these lines:

{code}
+ * Copyright 2007 The Apache Software Foundation
{code}

In our code base, we use two spaces for tabs (no hard tabs you have in your 
file).

Fix the name of this config:

{code}
+   
getInt(mapred.map.multithreadedrunner.threads, 10);
{code}

Ditto for the setter.

You don't want to use an executor and something like guava's utility creating 
the executor running the threads?  (See hbase code base for examples)



 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira