[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful
[ https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152856#comment-14152856 ] stack commented on HBASE-12117: --- Nice graph [~apurtell] How did you make the traces.client.c.svg graph, what loading produced it, and how does it relate to traces.client.c.svg? I see in the latter we spend most CPU reading and writing. Is the former a portion of this latter graph or another loading? Constructors that use Configuration may be harmful -- Key: HBASE-12117 URL: https://issues.apache.org/jira/browse/HBASE-12117 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Attachments: traces.client.c.svg, traces.client.getHTable.svg There's a common pattern in HBase code where in the constructor, or in an initialization method also called once per instantiation, or both, we look up values from Hadoop Configuration and store them into fields. This can be expensive if the object is frequently created. Configuration is a heavyweight registry that does a lot of string operations and regex matching. See attached example. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. (The remainder is spent instantiating the RPC controller via reflection, a separate issue that merits followup elsewhere.) Creation of HTable instances is expected to be a lightweight operation if a client is using unmanaged HConnections; however creating HTable instances takes up about 18% of the client's total on-CPU time. This is just one example where constructors that use Configuration may be harmful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful
[ https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152868#comment-14152868 ] Andrew Purtell commented on HBASE-12117: This was YCSB running against an all localhost 0.98 cluster. The old YCSB client. This is workload C. traces.client.getHTable.svg is traces.client.c.svg with all sampled call arcs filtered out except those that go through getHTable, in effect zooming in on a portion of the full trace. Workload C is an all read workload. This is a client trace. Constructors that use Configuration may be harmful -- Key: HBASE-12117 URL: https://issues.apache.org/jira/browse/HBASE-12117 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Attachments: traces.client.c.svg, traces.client.getHTable.svg There's a common pattern in HBase code where in the constructor, or in an initialization method also called once per instantiation, or both, we look up values from Hadoop Configuration and store them into fields. This can be expensive if the object is frequently created. Configuration is a heavyweight registry that does a lot of string operations and regex matching. See attached example. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. (The remainder is spent instantiating the RPC controller via reflection, a separate issue that merits followup elsewhere.) Creation of HTable instances is expected to be a lightweight operation if a client is using unmanaged HConnections; however creating HTable instances takes up about 18% of the client's total on-CPU time. This is just one example where constructors that use Configuration may be harmful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful
[ https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152872#comment-14152872 ] stack commented on HBASE-12117: --- So the focus is on 17% of overall CPU -- the finishSetup of HTable -- and of this most is just getting Configuration. Yuck. This is our making an HTable instance per session? We keep it around? HTable ain't that lightweight then? Any chance of caching some config in Connection for instance? Constructors that use Configuration may be harmful -- Key: HBASE-12117 URL: https://issues.apache.org/jira/browse/HBASE-12117 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Attachments: traces.client.c.svg, traces.client.getHTable.svg There's a common pattern in HBase code where in the constructor, or in an initialization method also called once per instantiation, or both, we look up values from Hadoop Configuration and store them into fields. This can be expensive if the object is frequently created. Configuration is a heavyweight registry that does a lot of string operations and regex matching. See attached example. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. (The remainder is spent instantiating the RPC controller via reflection, a separate issue that merits followup elsewhere.) Creation of HTable instances is expected to be a lightweight operation if a client is using unmanaged HConnections; however creating HTable instances takes up about 18% of the client's total on-CPU time. This is just one example where constructors that use Configuration may be harmful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful
[ https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153312#comment-14153312 ] Andrew Purtell commented on HBASE-12117: Yeah I don't think HTable is as lightweight as we want because if an app manages its own HConnection and creates an HTable for each interaction - as we recommend - then it can pay this unexpected cost as much as ~20% of CPU time. This is one example where using Configuration to set up an object is expensive. Found this when looking at something else. Yes I think we could cache configuration in Connection. We are using it like a factory for HTable. Object factories would one way to address this (anti?)pattern wherever it's costly. Related, we should also create by reflection once and cache the desired RpcController object, and clone it for new HTables for the Connection. Constructors that use Configuration may be harmful -- Key: HBASE-12117 URL: https://issues.apache.org/jira/browse/HBASE-12117 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Attachments: traces.client.c.svg, traces.client.getHTable.svg There's a common pattern in HBase code where in the constructor, or in an initialization method also called once per instantiation, or both, we look up values from Hadoop Configuration and store them into fields. This can be expensive if the object is frequently created. Configuration is a heavyweight registry that does a lot of string operations and regex matching. See attached example. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. (The remainder is spent instantiating the RPC controller via reflection, a separate issue that merits followup elsewhere.) Creation of HTable instances is expected to be a lightweight operation if a client is using unmanaged HConnections; however creating HTable instances takes up about 18% of the client's total on-CPU time. This is just one example where constructors that use Configuration may be harmful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12117) Constructors that use Configuration may be harmful
[ https://issues.apache.org/jira/browse/HBASE-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153960#comment-14153960 ] Andrew Purtell commented on HBASE-12117: I created HBASE-12128 for the specific case mentioned above and changed this to an umbrella to catch more instances of the issue. Constructors that use Configuration may be harmful -- Key: HBASE-12117 URL: https://issues.apache.org/jira/browse/HBASE-12117 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Attachments: traces.client.c.svg, traces.client.getHTable.svg There's a common pattern in HBase code where in the constructor, or in an initialization method also called once per instantiation, or both, we look up values from Hadoop Configuration and store them into fields. This can be expensive if the object is frequently created. Configuration is a heavyweight registry that does a lot of string operations and regex matching. See attached example. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. (The remainder is spent instantiating the RPC controller via reflection, a separate issue that merits followup elsewhere.) Creation of HTable instances is expected to be a lightweight operation if a client is using unmanaged HConnections; however creating HTable instances takes up about 18% of the client's total on-CPU time. This is just one example where constructors that use Configuration may be harmful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)