[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful

2014-09-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152856#comment-14152856
 ] 

stack commented on HBASE-12117:
---

Nice graph [~apurtell]  How did you make the traces.client.c.svg graph, what 
loading produced it,  and how does it relate to traces.client.c.svg?  I see in 
the latter we spend most CPU reading and writing.  Is the former a portion of 
this latter graph or another loading?

 Constructors that use Configuration may be harmful
 --

 Key: HBASE-12117
 URL: https://issues.apache.org/jira/browse/HBASE-12117
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
 Attachments: traces.client.c.svg, traces.client.getHTable.svg


 There's a common pattern in HBase code where in the constructor, or in an 
 initialization method also called once per instantiation, or both, we look up 
 values from Hadoop Configuration and store them into fields. This can be 
 expensive if the object is frequently created. Configuration is a heavyweight 
 registry that does a lot of string operations and regex matching. See 
 attached example. Method calls into Configuration account for 48.25% of CPU 
 time when creating the HTable object in 0.98. (The remainder is spent 
 instantiating the RPC controller via reflection, a separate issue that merits 
 followup elsewhere.) Creation of HTable instances is expected to be a 
 lightweight operation if a client is using unmanaged HConnections; however 
 creating HTable instances takes up about 18% of the client's total on-CPU 
 time. This is just one example where constructors that use Configuration may 
 be harmful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful

2014-09-30 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152868#comment-14152868
 ] 

Andrew Purtell commented on HBASE-12117:


This was YCSB running against an all localhost 0.98 cluster. The old YCSB 
client. This is workload C. traces.client.getHTable.svg is traces.client.c.svg 
with all sampled call arcs filtered out except those that go through getHTable, 
in effect zooming in on a portion of the full trace. Workload C is an all read 
workload. This is a client trace.

 Constructors that use Configuration may be harmful
 --

 Key: HBASE-12117
 URL: https://issues.apache.org/jira/browse/HBASE-12117
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
 Attachments: traces.client.c.svg, traces.client.getHTable.svg


 There's a common pattern in HBase code where in the constructor, or in an 
 initialization method also called once per instantiation, or both, we look up 
 values from Hadoop Configuration and store them into fields. This can be 
 expensive if the object is frequently created. Configuration is a heavyweight 
 registry that does a lot of string operations and regex matching. See 
 attached example. Method calls into Configuration account for 48.25% of CPU 
 time when creating the HTable object in 0.98. (The remainder is spent 
 instantiating the RPC controller via reflection, a separate issue that merits 
 followup elsewhere.) Creation of HTable instances is expected to be a 
 lightweight operation if a client is using unmanaged HConnections; however 
 creating HTable instances takes up about 18% of the client's total on-CPU 
 time. This is just one example where constructors that use Configuration may 
 be harmful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful

2014-09-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152872#comment-14152872
 ] 

stack commented on HBASE-12117:
---

So the focus is on 17% of overall CPU -- the finishSetup of HTable -- and of 
this most is just getting Configuration.  Yuck.  This is our making an HTable 
instance per session?  We keep it around?  HTable ain't that lightweight then?  
Any chance of caching some config in Connection for instance?

 Constructors that use Configuration may be harmful
 --

 Key: HBASE-12117
 URL: https://issues.apache.org/jira/browse/HBASE-12117
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
 Attachments: traces.client.c.svg, traces.client.getHTable.svg


 There's a common pattern in HBase code where in the constructor, or in an 
 initialization method also called once per instantiation, or both, we look up 
 values from Hadoop Configuration and store them into fields. This can be 
 expensive if the object is frequently created. Configuration is a heavyweight 
 registry that does a lot of string operations and regex matching. See 
 attached example. Method calls into Configuration account for 48.25% of CPU 
 time when creating the HTable object in 0.98. (The remainder is spent 
 instantiating the RPC controller via reflection, a separate issue that merits 
 followup elsewhere.) Creation of HTable instances is expected to be a 
 lightweight operation if a client is using unmanaged HConnections; however 
 creating HTable instances takes up about 18% of the client's total on-CPU 
 time. This is just one example where constructors that use Configuration may 
 be harmful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful

2014-09-30 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153312#comment-14153312
 ] 

Andrew Purtell commented on HBASE-12117:


Yeah I don't think HTable is as lightweight as we want because if an app 
manages its own HConnection and creates an HTable for each interaction - as we 
recommend - then it can pay this unexpected cost as much as ~20% of CPU time. 
This is one example where using Configuration to set up an object is expensive. 
Found this when looking at something else. 

Yes I think we could cache configuration in Connection. We are using it like a 
factory for HTable. Object factories would one way to address this 
(anti?)pattern wherever it's costly.

Related, we should also create by reflection once and cache the desired 
RpcController object, and clone it for new HTables for the Connection.  

 Constructors that use Configuration may be harmful
 --

 Key: HBASE-12117
 URL: https://issues.apache.org/jira/browse/HBASE-12117
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
 Attachments: traces.client.c.svg, traces.client.getHTable.svg


 There's a common pattern in HBase code where in the constructor, or in an 
 initialization method also called once per instantiation, or both, we look up 
 values from Hadoop Configuration and store them into fields. This can be 
 expensive if the object is frequently created. Configuration is a heavyweight 
 registry that does a lot of string operations and regex matching. See 
 attached example. Method calls into Configuration account for 48.25% of CPU 
 time when creating the HTable object in 0.98. (The remainder is spent 
 instantiating the RPC controller via reflection, a separate issue that merits 
 followup elsewhere.) Creation of HTable instances is expected to be a 
 lightweight operation if a client is using unmanaged HConnections; however 
 creating HTable instances takes up about 18% of the client's total on-CPU 
 time. This is just one example where constructors that use Configuration may 
 be harmful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful

2014-09-30 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153960#comment-14153960
 ] 

Andrew Purtell commented on HBASE-12117:


 I created HBASE-12128 for the specific case mentioned above and changed this 
to an umbrella to catch more instances of the issue.

 Constructors that use Configuration may be harmful
 --

 Key: HBASE-12117
 URL: https://issues.apache.org/jira/browse/HBASE-12117
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell
 Attachments: traces.client.c.svg, traces.client.getHTable.svg


 There's a common pattern in HBase code where in the constructor, or in an 
 initialization method also called once per instantiation, or both, we look up 
 values from Hadoop Configuration and store them into fields. This can be 
 expensive if the object is frequently created. Configuration is a heavyweight 
 registry that does a lot of string operations and regex matching. See 
 attached example. Method calls into Configuration account for 48.25% of CPU 
 time when creating the HTable object in 0.98. (The remainder is spent 
 instantiating the RPC controller via reflection, a separate issue that merits 
 followup elsewhere.) Creation of HTable instances is expected to be a 
 lightweight operation if a client is using unmanaged HConnections; however 
 creating HTable instances takes up about 18% of the client's total on-CPU 
 time. This is just one example where constructors that use Configuration may 
 be harmful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)