Re: How to close resources shared in executor?

2014-10-17 Thread Fengyun RAO
Of course, I could create a connection in

val result = rdd.map(line = {
  val conf = HBaseConfiguration.create
  val connection = HConnectionManager.createConnection(conf)
  val table = connection.getTable(user)
  ...
  table.close()
  connection.close()
}

but that would be too slow, which is also the reason I share conf and
connection in Utilobject.

Maybe I did need a shutdown hook as the Javadoc says.

Thank you!

2014-10-17 12:18 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Looking at Apache 0.98 code, you can follow the example in the class
 javadoc (line 144 of HConnectionManager.java):

  * HTableInterface table = connection.getTable(table1);
  * try {
  *   // Use the table as needed, for a single operation and a single thread
  * } finally {
  *   table.close();
  *   connection.close();
  * }

 Cheers

 On Thu, Oct 16, 2014 at 9:03 PM, Fengyun RAO raofeng...@gmail.com wrote:

 Thanks, Ted,

 We use CDH 5.1 and the HBase version is 0.98.1-cdh5.1.0, in which the
 javadoc of HConnectionManager.java still recommends shutdown hook.

 I look into val table = Util.Connection.getTable(user), and find it
 didn't invoke

 public HTable(Configuration conf, final byte[] tableName, final 
 ExecutorService pool)

 but

 public HTable(TableName tableName, final HConnection connection,
   final ExecutorService pool) throws IOException {
 if (connection == null || connection.isClosed()) {
   throw new IllegalArgumentException(Connection is null or closed.);
 }
 this.tableName = tableName;
 this.cleanupPoolOnClose = this.cleanupConnectionOnClose = false;
 this.connection = connection;
 this.configuration = connection.getConfiguration();
 this.pool = pool;

 this.finishSetup();
   }

 in which cleanupConnectionOnClose is false

 2014-10-16 22:51 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Which hbase release are you using ?

 Let me refer to 0.94 code hbase.

 Take a look at the following method
 in src/main/java/org/apache/hadoop/hbase/client/HTable.java :

   public void close() throws IOException {
 ...
 if (cleanupConnectionOnClose) {
   if (this.connection != null) {
 this.connection.close();

 When Connection.getTable() is called, the following is invoked:
   public HTable(Configuration conf, final byte[] tableName, final
 ExecutorService pool)
 which sets cleanupConnectionOnClose to true.

 w.r.t. javadoc, the paragraph on shutdown hook is
 in HConnectionManager.java of 0.94
 You don't need to use shutdown hook for 0.94+

 Cheers

 On Wed, Oct 15, 2014 at 11:41 PM, Fengyun RAO raofeng...@gmail.com
 wrote:

 I may have misunderstood your point.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   table.close()
 }

 Did you mean this is enough, and there’s no need to call
 Util.Connection.close(),
 or HConnectionManager.deleteAllConnections()?

 Where is the documentation that statesHconnectionManager would release
 underlying connection automatically?
 If that’s true, maybe the Javadoc which recommends a shutdown hook
 needs update
 ​

 2014-10-16 14:20 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 Thanks, Ted.
 Util.Connection.close() should be called only once, so it can NOT be
 in a map function

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   Util.Connection.close()
 }

 As you mentioned:

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Yes, we should call table.close(), but it won’t remove HConnection in
 HConnectionManager which is a HConnection pool.
 As I look into the HconnectionManager Javadoc, it seems I have to
 implement a shutdown hook

  * pCleanup used to be done inside in a shutdown hook.  On startup we'd
  * register a shutdown hook that called {@link #deleteAllConnections()}
  * on its way out but the order in which shutdown hooks run is not 
 defined so
  * were problematic for clients of HConnection that wanted to register 
 their
  * own shutdown hooks so we removed ours though this shifts the onus for
  * cleanup to the client.

 ​

 2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Pardon me - there was typo in previous email.

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to
 the underlying connection are gone, connection would be released.

 Cheers

 On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = 
 Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com
 wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be 

Re: How to close resources shared in executor?

2014-10-16 Thread Fengyun RAO
Thanks, Ted.
Util.Connection.close() should be called only once, so it can NOT be in a
map function

val result = rdd.map(line = {
  val table = Util.Connection.getTable(user)
  ...
  Util.Connection.close()
}

As you mentioned:

Calling table.close() is the recommended approach.
HConnectionManager does reference counting. When all references to the
underlying connection are gone, connection would be released.

Yes, we should call table.close(), but it won’t remove HConnection in
HConnectionManager which is a HConnection pool.
As I look into the HconnectionManager Javadoc, it seems I have to implement
a shutdown hook

 * pCleanup used to be done inside in a shutdown hook.  On startup we'd
 * register a shutdown hook that called {@link #deleteAllConnections()}
 * on its way out but the order in which shutdown hooks run is not defined so
 * were problematic for clients of HConnection that wanted to register their
 * own shutdown hooks so we removed ours though this shifts the onus for
 * cleanup to the client.

​

2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Pardon me - there was typo in previous email.

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Cheers

 On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com
 wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be shared among tasks on the same executor. e.g.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
 }

 However, we don’t how to close the Util.Connection.
 If we write Util.Connection.close() in the main function,
 it’ll only run on the driver, not the executor.

 So, How to make sure every Connection closed before exist?
 ​






Re: How to close resources shared in executor?

2014-10-16 Thread Fengyun RAO
I may have misunderstood your point.

val result = rdd.map(line = {
  val table = Util.Connection.getTable(user)
  ...
  table.close()
}

Did you mean this is enough, and there’s no need to call
Util.Connection.close(),
or HConnectionManager.deleteAllConnections()?

Where is the documentation that statesHconnectionManager would release
underlying connection automatically?
If that’s true, maybe the Javadoc which recommends a shutdown hook needs
update
​

2014-10-16 14:20 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 Thanks, Ted.
 Util.Connection.close() should be called only once, so it can NOT be in a
 map function

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   Util.Connection.close()
 }

 As you mentioned:

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Yes, we should call table.close(), but it won’t remove HConnection in
 HConnectionManager which is a HConnection pool.
 As I look into the HconnectionManager Javadoc, it seems I have to
 implement a shutdown hook

  * pCleanup used to be done inside in a shutdown hook.  On startup we'd
  * register a shutdown hook that called {@link #deleteAllConnections()}
  * on its way out but the order in which shutdown hooks run is not defined so
  * were problematic for clients of HConnection that wanted to register their
  * own shutdown hooks so we removed ours though this shifts the onus for
  * cleanup to the client.

 ​

 2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Pardon me - there was typo in previous email.

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Cheers

 On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com
 wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be shared among tasks on the same executor. e.g.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
 }

 However, we don’t how to close the Util.Connection.
 If we write Util.Connection.close() in the main function,
 it’ll only run on the driver, not the executor.

 So, How to make sure every Connection closed before exist?
 ​







Re: How to close resources shared in executor?

2014-10-16 Thread Ted Yu
Which hbase release are you using ?

Let me refer to 0.94 code hbase.

Take a look at the following method
in src/main/java/org/apache/hadoop/hbase/client/HTable.java :

  public void close() throws IOException {
...
if (cleanupConnectionOnClose) {
  if (this.connection != null) {
this.connection.close();

When Connection.getTable() is called, the following is invoked:
  public HTable(Configuration conf, final byte[] tableName, final
ExecutorService pool)
which sets cleanupConnectionOnClose to true.

w.r.t. javadoc, the paragraph on shutdown hook is
in HConnectionManager.java of 0.94
You don't need to use shutdown hook for 0.94+

Cheers

On Wed, Oct 15, 2014 at 11:41 PM, Fengyun RAO raofeng...@gmail.com wrote:

 I may have misunderstood your point.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   table.close()
 }

 Did you mean this is enough, and there’s no need to call
 Util.Connection.close(),
 or HConnectionManager.deleteAllConnections()?

 Where is the documentation that statesHconnectionManager would release
 underlying connection automatically?
 If that’s true, maybe the Javadoc which recommends a shutdown hook needs
 update
 ​

 2014-10-16 14:20 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 Thanks, Ted.
 Util.Connection.close() should be called only once, so it can NOT be in
 a map function

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   Util.Connection.close()
 }

 As you mentioned:

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Yes, we should call table.close(), but it won’t remove HConnection in
 HConnectionManager which is a HConnection pool.
 As I look into the HconnectionManager Javadoc, it seems I have to
 implement a shutdown hook

  * pCleanup used to be done inside in a shutdown hook.  On startup we'd
  * register a shutdown hook that called {@link #deleteAllConnections()}
  * on its way out but the order in which shutdown hooks run is not defined so
  * were problematic for clients of HConnection that wanted to register their
  * own shutdown hooks so we removed ours though this shifts the onus for
  * cleanup to the client.

 ​

 2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Pardon me - there was typo in previous email.

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Cheers

 On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com
 wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be shared among tasks on the same executor. e.g.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
 }

 However, we don’t how to close the Util.Connection.
 If we write Util.Connection.close() in the main function,
 it’ll only run on the driver, not the executor.

 So, How to make sure every Connection closed before exist?
 ​








Re: How to close resources shared in executor?

2014-10-16 Thread Fengyun RAO
Thanks, Ted,

We use CDH 5.1 and the HBase version is 0.98.1-cdh5.1.0, in which the
javadoc of HConnectionManager.java still recommends shutdown hook.

I look into val table = Util.Connection.getTable(user), and find it
didn't invoke

public HTable(Configuration conf, final byte[] tableName, final
ExecutorService pool)

but

public HTable(TableName tableName, final HConnection connection,
  final ExecutorService pool) throws IOException {
if (connection == null || connection.isClosed()) {
  throw new IllegalArgumentException(Connection is null or closed.);
}
this.tableName = tableName;
this.cleanupPoolOnClose = this.cleanupConnectionOnClose = false;
this.connection = connection;
this.configuration = connection.getConfiguration();
this.pool = pool;

this.finishSetup();
  }

in which cleanupConnectionOnClose is false

2014-10-16 22:51 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Which hbase release are you using ?

 Let me refer to 0.94 code hbase.

 Take a look at the following method
 in src/main/java/org/apache/hadoop/hbase/client/HTable.java :

   public void close() throws IOException {
 ...
 if (cleanupConnectionOnClose) {
   if (this.connection != null) {
 this.connection.close();

 When Connection.getTable() is called, the following is invoked:
   public HTable(Configuration conf, final byte[] tableName, final
 ExecutorService pool)
 which sets cleanupConnectionOnClose to true.

 w.r.t. javadoc, the paragraph on shutdown hook is
 in HConnectionManager.java of 0.94
 You don't need to use shutdown hook for 0.94+

 Cheers

 On Wed, Oct 15, 2014 at 11:41 PM, Fengyun RAO raofeng...@gmail.com
 wrote:

 I may have misunderstood your point.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   table.close()
 }

 Did you mean this is enough, and there’s no need to call
 Util.Connection.close(),
 or HConnectionManager.deleteAllConnections()?

 Where is the documentation that statesHconnectionManager would release
 underlying connection automatically?
 If that’s true, maybe the Javadoc which recommends a shutdown hook needs
 update
 ​

 2014-10-16 14:20 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 Thanks, Ted.
 Util.Connection.close() should be called only once, so it can NOT be in
 a map function

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   Util.Connection.close()
 }

 As you mentioned:

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Yes, we should call table.close(), but it won’t remove HConnection in
 HConnectionManager which is a HConnection pool.
 As I look into the HconnectionManager Javadoc, it seems I have to
 implement a shutdown hook

  * pCleanup used to be done inside in a shutdown hook.  On startup we'd
  * register a shutdown hook that called {@link #deleteAllConnections()}
  * on its way out but the order in which shutdown hooks run is not defined 
 so
  * were problematic for clients of HConnection that wanted to register their
  * own shutdown hooks so we removed ours though this shifts the onus for
  * cleanup to the client.

 ​

 2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Pardon me - there was typo in previous email.

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Cheers

 On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = 
 Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com
 wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be shared among tasks on the same executor. e.g.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
 }

 However, we don’t how to close the Util.Connection.
 If we write Util.Connection.close() in the main function,
 it’ll only run on the driver, not the executor.

 So, How to make sure every Connection closed before exist?
 ​









Re: How to close resources shared in executor?

2014-10-16 Thread Ted Yu
Looking at Apache 0.98 code, you can follow the example in the class
javadoc (line 144 of HConnectionManager.java):

 * HTableInterface table = connection.getTable(table1);
 * try {
 *   // Use the table as needed, for a single operation and a single thread
 * } finally {
 *   table.close();
 *   connection.close();
 * }

Cheers

On Thu, Oct 16, 2014 at 9:03 PM, Fengyun RAO raofeng...@gmail.com wrote:

 Thanks, Ted,

 We use CDH 5.1 and the HBase version is 0.98.1-cdh5.1.0, in which the
 javadoc of HConnectionManager.java still recommends shutdown hook.

 I look into val table = Util.Connection.getTable(user), and find it
 didn't invoke

 public HTable(Configuration conf, final byte[] tableName, final 
 ExecutorService pool)

 but

 public HTable(TableName tableName, final HConnection connection,
   final ExecutorService pool) throws IOException {
 if (connection == null || connection.isClosed()) {
   throw new IllegalArgumentException(Connection is null or closed.);
 }
 this.tableName = tableName;
 this.cleanupPoolOnClose = this.cleanupConnectionOnClose = false;
 this.connection = connection;
 this.configuration = connection.getConfiguration();
 this.pool = pool;

 this.finishSetup();
   }

 in which cleanupConnectionOnClose is false

 2014-10-16 22:51 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Which hbase release are you using ?

 Let me refer to 0.94 code hbase.

 Take a look at the following method
 in src/main/java/org/apache/hadoop/hbase/client/HTable.java :

   public void close() throws IOException {
 ...
 if (cleanupConnectionOnClose) {
   if (this.connection != null) {
 this.connection.close();

 When Connection.getTable() is called, the following is invoked:
   public HTable(Configuration conf, final byte[] tableName, final
 ExecutorService pool)
 which sets cleanupConnectionOnClose to true.

 w.r.t. javadoc, the paragraph on shutdown hook is
 in HConnectionManager.java of 0.94
 You don't need to use shutdown hook for 0.94+

 Cheers

 On Wed, Oct 15, 2014 at 11:41 PM, Fengyun RAO raofeng...@gmail.com
 wrote:

 I may have misunderstood your point.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   table.close()
 }

 Did you mean this is enough, and there’s no need to call
 Util.Connection.close(),
 or HConnectionManager.deleteAllConnections()?

 Where is the documentation that statesHconnectionManager would release
 underlying connection automatically?
 If that’s true, maybe the Javadoc which recommends a shutdown hook needs
 update
 ​

 2014-10-16 14:20 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 Thanks, Ted.
 Util.Connection.close() should be called only once, so it can NOT be
 in a map function

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
   Util.Connection.close()
 }

 As you mentioned:

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Yes, we should call table.close(), but it won’t remove HConnection in
 HConnectionManager which is a HConnection pool.
 As I look into the HconnectionManager Javadoc, it seems I have to
 implement a shutdown hook

  * pCleanup used to be done inside in a shutdown hook.  On startup we'd
  * register a shutdown hook that called {@link #deleteAllConnections()}
  * on its way out but the order in which shutdown hooks run is not defined 
 so
  * were problematic for clients of HConnection that wanted to register 
 their
  * own shutdown hooks so we removed ours though this shifts the onus for
  * cleanup to the client.

 ​

 2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com:

 Pardon me - there was typo in previous email.

 Calling table.close() is the recommended approach.
 HConnectionManager does reference counting. When all references to the
 underlying connection are gone, connection would be released.

 Cheers

 On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = 
 Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com
 wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be shared among tasks on the same executor. e.g.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
 }

 However, we don’t how to close the Util.Connection.
 If we write Util.Connection.close() in the main function,
 it’ll only run on the driver, not the executor.

 So, How to make sure every Connection closed before exist?
 ​










Re: How to close resources shared in executor?

2014-10-15 Thread Ted Yu
Pardon me - there was typo in previous email.

Calling table.close() is the recommended approach.
HConnectionManager does reference counting. When all references to the
underlying connection are gone, connection would be released.

Cheers

On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried the following ?

 val result = rdd.map(line = { val table = Util.Connection.getTable(user)
 ...
 Util.Connection.close() }

 On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO raofeng...@gmail.com wrote:

 In order to share an HBase connection pool, we create an object

 Object Util {
 val HBaseConf = HBaseConfiguration.create
 val Connection= HConnectionManager.createConnection(HBaseConf)
 }

 which would be shared among tasks on the same executor. e.g.

 val result = rdd.map(line = {
   val table = Util.Connection.getTable(user)
   ...
 }

 However, we don’t how to close the Util.Connection.
 If we write Util.Connection.close() in the main function,
 it’ll only run on the driver, not the executor.

 So, How to make sure every Connection closed before exist?
 ​