[
https://issues.apache.org/jira/browse/YARN-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Li Lu updated YARN-3595:
------------------------
Description:
The story about the connection cache in Phoenix timeline storage is a little
bit long. In YARN-3033 we planned to have shared writer layer for all
collectors in the same collector manager. In this way we can better reuse the
same heavy-weight storage layer connection, therefore it's more friendly to
conventional storage layer connections which are typically heavy-weight.
Phoenix, on the other hand, implements its own connection interface layer to be
light-weight, thread-unsafe. To make these connections work with our "multiple
collector, single writer" model, we're adding a thread indexed connection
cache. However, many performance critical factors are yet to be tested.
In this JIRA we're tracing performance optimization efforts using this
connection cache. Previously we had a draft, but there was one implementation
challenge on cache evictions: There may be races between Guava cache's removal
listener calls (which close the connection) and normal references to the
connection. We need to carefully define the way they synchronize.
Performance-wise, at the very beginning stage we may need to understand:
# If the current, thread-based indexing is an appropriate approach, or we can
use some better ways to index the connections.
# the best size of the cache, presumably as the proposed default value of a
configuration.
# how long we need to preserve a connection in the cache.
Please feel free to add this list.
was:
The story about the connection cache in Phoenix timeline storage is a little
bit long. In YARN-3033 we planned to have shared writer layer for all
collectors in the same collector manager. In this way we can better reuse the
same heavy-weight storage layer connection, therefore it's more friendly to
conventional storage layer connections which are typically heavy-weight.
Phoenix, on the other hand, implements its own connection interface layer to be
light-weight, thread-unsafe. To make these connections work with our "multiple
collector, single writer" model, we're adding a thread indexed connection
cache. However, many performance critical factors are yet to be tested.
In this JIRA we're tracing performance optimization efforts using this
connection cache. Currently
At the very beginning stage we may need to understand:
# If the current, thread-based indexing is an appropriate approach, or we can
use some better ways to index the connections.
# the best size of the cache, presumably as the proposed default value of a
configuration.
# how long we need to preserve a connection in the cache.
Please feel free to add this list.
> Performance optimization using connection cache of Phoenix timeline writer
> --------------------------------------------------------------------------
>
> Key: YARN-3595
> URL: https://issues.apache.org/jira/browse/YARN-3595
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Li Lu
> Assignee: Li Lu
>
> The story about the connection cache in Phoenix timeline storage is a little
> bit long. In YARN-3033 we planned to have shared writer layer for all
> collectors in the same collector manager. In this way we can better reuse the
> same heavy-weight storage layer connection, therefore it's more friendly to
> conventional storage layer connections which are typically heavy-weight.
> Phoenix, on the other hand, implements its own connection interface layer to
> be light-weight, thread-unsafe. To make these connections work with our
> "multiple collector, single writer" model, we're adding a thread indexed
> connection cache. However, many performance critical factors are yet to be
> tested.
> In this JIRA we're tracing performance optimization efforts using this
> connection cache. Previously we had a draft, but there was one implementation
> challenge on cache evictions: There may be races between Guava cache's
> removal listener calls (which close the connection) and normal references to
> the connection. We need to carefully define the way they synchronize.
> Performance-wise, at the very beginning stage we may need to understand:
> # If the current, thread-based indexing is an appropriate approach, or we can
> use some better ways to index the connections.
> # the best size of the cache, presumably as the proposed default value of a
> configuration.
> # how long we need to preserve a connection in the cache.
> Please feel free to add this list.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)