[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145871#comment-16145871 ] ASF GitHub Bot commented on PIO-106: Github user asfgit closed the pull request at: https://github.com/apache/incubator-predictionio/pull/421 > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different approaches we could take to sharing a > RestClient so that its keep-alive behavior may work as designed: > * maintain a
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126322#comment-16126322 ] ASF GitHub Bot commented on PIO-106: Github user mars closed the pull request at: https://github.com/apache/incubator-predictionio/pull/420 > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different approaches we could take to sharing a > RestClient so that its keep-alive behavior may work as designed: > * maintain a
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122849#comment-16122849 ] ASF GitHub Bot commented on PIO-106: Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/420 Seem to solve this [long ago reported Elasticsearch connection issue](https://github.com/elastic/elasticsearch/issues/3647) > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122618#comment-16122618 ] ASF GitHub Bot commented on PIO-106: Github user mars commented on a diff in the pull request: https://github.com/apache/incubator-predictionio/pull/420#discussion_r132601642 --- Diff: storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEvaluationInstances.scala --- @@ -110,28 +104,24 @@ class ESEvaluationInstances(client: ESClient, config: StorageClientConfig, index error(s"Failed to access to /$index/$estype/$id", e) None } finally { - restClient.close() + client.close() --- End diff -- This `close` should be removed. > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122574#comment-16122574 ] ASF GitHub Bot commented on PIO-106: GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/420 [PIO-106] Elasticsearch 5.x StorageClient should reuse RestClient Implements [PIO-106](https://issues.apache.org/jira/browse/PIO-106) This PR moves to a singleton Elasticsearch RestClient which has built-in HTTP keep-alive and TCP connection pooling. Running on this branch, we've seen a 2x speed-up in predictions from the Universal Recommender with ES5, and the feared "cannot bind" Elasticsearch connection errors have completely disappeared. Running `pio batchpredict` for 170K queries results in only 7 total TCP connections to Elasticsearch. Previously that would escalate to ~25,000 connections before denying further connections. **This fundamentally changes the interface for the new [Elasticsearch 5.x REST client](https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch)** introduced with PredictionIO 0.11.0-incubating. With this changeset, the `client` is a single instance of [`org.elasticsearch.client.RestClient`](https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java). **As a result of this change, any engine templates that directly use the Elasticsearch 5 StorageClient would require an update for compatibility.** The change is this: ### Original ```scala val client: StorageClient = … // code to instantiate client val restClient: RestClient = client.open() try { restClient.performRequest(…) } finally { restClient.close() } ``` ### With this PR ```scala val client: RestClient = … // code to instantiate client client.performRequest(…) ``` *No more balancing `open` & `close` as this is handled by using a new `CleanupFunctions` hook added to the framework in this PR.* [Universal Recommender](https://github.com/actionml/universal-recommender) is the only template that I know of which directly uses the ES StorageClient outside of PredictionIO core. See the [UR changes for compatibility with this PR](https://github.com/heroku/predictionio-engine-ur/compare/esclient-singleton). ### Elasticsearch StorageClient changes * reimplemented as singleton * installs a cleanup function See [StorageClient](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2926f4cfd93ccb02320e2a9503ccd223) ### Core changes A new [`CleanupFunctions`](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2a958821ac58f019fbce38540c775f19) hook has been added which enables developers of storage modules to register anonymous functions with `CleanupFunctions.add { … }` to be executed after Spark-related commands/workflows. The hook is called in a `finally { CleanupFunctions.run() }` from within: * `pio import` * `pio export` * `pio train` * `pio batchpredict` Apologies for the huge indentation shifts from the requisite try-finally blocks: ```scala try { // Freshly indented code. } finally { CleanupFunctions.run() } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio esclient-singleton Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #420 commit f30f27bcc09a397efb42a7923938beceaeac37bf Author: Mars HallDate: 2017-08-08T23:29:15Z Migrate to singleton Elasticsearch client to use underlying connection pooling (PoolingNHttpClientConnectionManager) commit d99927089a41cb85f525cb74bdf394eed4686bf2 Author: Mars Hall Date: 2017-08-10T03:00:58Z Log stacktrace for Storage initialization errors. commit dc4c31cbcddbb3b281d52b8099e210adc546d1ed Author: Mars Hall Date: 2017-08-10T22:55:38Z Remove shade rule that breaks Elasticsearch 5 client commit 7634a7ab720239d5f8efda85f67b26bdaff797f8 Author: Mars Hall Date: 2017-08-10T22:59:01Z Collect & run cleanup functions to allow spark-submit processes to end gracefully. commit 5953451f40e554eafa887328122c794edbbd8f1d Author: Mars Hall Date: 2017-08-11T00:06:24Z Rename CleanupFunctions to match object name
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094896#comment-16094896 ] Naoki Takezoe commented on PIO-106: --- I tried to add cleanup step to storage some time ago, but I couldn't do it easily because PIO uses initialization of storages as health checking as well. If we do it with keeping current functionality and end user APIs, I think some refactoring of the internal structure of storage is required. > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. >
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094270#comment-16094270 ] Shinsuke Sugaya commented on PIO-106: - The background is [here|https://github.com/apache/incubator-predictionio/pull/336#discussion_r96759997]. I think it's better to add a cleanup step in the storage layer and then reuse elasticsearch connection. > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different approaches we could take to sharing a > RestClient so