[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-05-23 Thread Patrick Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281995#comment-13281995
 ] 

Patrick Yu commented on HBASE-3996:
---

@Ming Ma
I'm not so sure about multi-table inputs, but multi-scan is very useful in 
cases where the row keys are prefixed with a salt value in order to avoid the 
hot region problem. For example, if the row keys are like byte(0-63) + actual 
timestamp, then using one scan with the specific prefix per region (map task) 
would be less expensive than using very complicated filters for the same 
purpose.

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 
> 3996-v6.txt, 3996-v7.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

2012-04-20 Thread Patrick Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258696#comment-13258696
 ] 

Patrick Yu commented on HBASE-3792:
---

One possible workaround for this issue is to reuse the connection for all jobs. 
However, hbase.connection.per.config must be set to false for connection 
sharing to work. This flag defaults to true up until HBASE-4508, though.

> TableInputFormat leaks ZK connections
> -
>
> Key: HBASE-3792
> URL: https://issues.apache.org/jira/browse/HBASE-3792
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.1
> Environment: Java 1.6.0_24, Mac OS X 10.6.7
>Reporter: Bryan Keller
> Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and 
> it never cleans it up. When running a Mapper, the TableInputFormat is 
> instantiated and the ZK connection is created. While this connection is not 
> explicitly cleaned up, the Mapper process eventually exits and thus the 
> connection is closed. Ideally the TableRecordReader would close the 
> connection in its close() method rather than relying on the process to die 
> for connection cleanup. This is fairly easy to implement by overriding 
> TableRecordReader, and also overriding TableInputFormat to specify the new 
> record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the 
> splits. To get the splits, it instantiates a TableInputFormat. Doing so 
> creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
> my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does 
> not initialize the HTable in the getConf() method and does not have an HTable 
> member variable. Rather, it has a variable for the table name. The HTable is 
> instantiated where needed and then cleaned up. For example, in the 
> getSplits() method, I create the HTable, then close the connection once the 
> splits are retrieved. I also create the HTable when creating the record 
> reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira