[jira] [Updated] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
[ https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-114: -- External issue URL: https://github.com/apache/incubator-predictionio/pull/421 > Elasticsearch 5.x StorageClient basic HTTP authentication > - > > Key: PIO-114 > URL: https://issues.apache.org/jira/browse/PIO-114 > Project: PredictionIO > Issue Type: New Feature > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > Add optional username-password configuration for the new Elasticsearch 5 > client; in {{conf/pio-env.sh}} config: > {code} > # Optional basic HTTP auth > PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name > PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret > {code} > These credentials are sent in each Elasticsearch request as an HTTP Basic > Authorization header. > Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai > on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-predictionio pull request #421: Elasticsearch singleton client wit...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/421 Elasticsearch singleton client with authentication Fixes both [PIO-106](https://issues.apache.org/jira/browse/PIO-106) & [PIO-114](https://issues.apache.org/jira/browse/PIO-114), replacing https://github.com/apache/incubator-predictionio/pull/372. These are combined because they each heavily revise the same class. ## Authentication Add optional username-password configuration for the new Elasticsearch 5 client; in `pio-env.sh` config: ```bash # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret ``` These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). ## Singleton client This PR moves to a singleton Elasticsearch RestClient which has built-in HTTP keep-alive and TCP connection pooling. Running on this branch, we've seen a 2x speed-up in predictions from the Universal Recommender with ES5, and the feared "cannot assign requested address" ð± Elasticsearch connection errors have completely disappeared. Running `pio batchpredict` for 160K queries results in only 7 total TCP connections to Elasticsearch. Previously that would escalate to ~25,000 connections before denying further connections. **This fundamentally changes the interface for the new [Elasticsearch 5.x REST client](https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch)** introduced with PredictionIO 0.11.0-incubating. With this changeset, the `client` is a single instance of [`org.elasticsearch.client.RestClient`](https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java). ð¨ **As a result of this change, any engine templates that directly use the Elasticsearch 5 StorageClient would require an update for compatibility.** The change is this: ### Original ```scala val client: StorageClient = ⦠// code to instantiate client val restClient: RestClient = client.open() try { restClient.performRequest(â¦) } finally { restClient.close() } ``` ### With this PR ```scala val client: RestClient = ⦠// code to instantiate client client.performRequest(â¦) ``` *No more balancing `open` & `close` as this is handled by using a new `CleanupFunctions` hook added to the framework in this PR.* [Universal Recommender](https://github.com/actionml/universal-recommender) is the only template that I know of which directly uses the ES StorageClient outside of PredictionIO core. See example [UR changes for compatibility with this PR](https://github.com/heroku/predictionio-engine-ur/compare/esclient-singleton). ### Elasticsearch StorageClient changes * reimplemented as singleton * installs a cleanup function See [StorageClient](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2926f4cfd93ccb02320e2a9503ccd223) ### Core changes A new [`CleanupFunctions`](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2a958821ac58f019fbce38540c775f19) hook has been added which enables developers of storage modules to register anonymous functions with `CleanupFunctions.add { ⦠}` to be executed after Spark-related commands/workflows. The hook is called in a `finally { CleanupFunctions.run() }` from within: * `pio import` * `pio export` * `pio train` * `pio batchpredict` Apologies for the huge indentation shifts from the requisite try-finally blocks: ```scala try { // Freshly indented code. } finally { CleanupFunctions.run() } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio esclient-singleton-with-auth Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/421.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #421 commit f30f27bcc09a397efb42a7923938beceaeac37bf Author: Mars HallDate: 2017-08-08T23:29:15Z Migrate to singleton Elasticsearch client to use underlying connection pooling (PoolingNHttpClientConnectionManager) commit d99927089a41cb85f525cb74bdf394eed4686bf2 Author: Mars Hall Date: 2017-08-10T03:00:58Z
[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...
Github user mars closed the pull request at: https://github.com/apache/incubator-predictionio/pull/420 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #420: [PIO-106] Elasticsearch 5.x StorageClient...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/420 Closing in favor of https://github.com/apache/incubator-predictionio/pull/421 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #372: Elasticsearch basic HTTP authentication
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/372 Closing in favor of https://github.com/apache/incubator-predictionio/pull/421 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126322#comment-16126322 ] ASF GitHub Bot commented on PIO-106: Github user mars closed the pull request at: https://github.com/apache/incubator-predictionio/pull/420 > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different approaches we could take to sharing a > RestClient so that its keep-alive behavior may work as designed: > * maintain a
[jira] [Updated] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
[ https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-114: -- Description: Add optional username-password configuration for the new Elasticsearch 5 client; in {{conf/pio-env.sh}} config: {code} # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret {code} These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). was: Add optional username-password configuration for the new Elasticsearch 5 client; in {conf/pio-env.sh} config: {code} # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret {code} These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). > Elasticsearch 5.x StorageClient basic HTTP authentication > - > > Key: PIO-114 > URL: https://issues.apache.org/jira/browse/PIO-114 > Project: PredictionIO > Issue Type: New Feature > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > Add optional username-password configuration for the new Elasticsearch 5 > client; in {{conf/pio-env.sh}} config: > {code} > # Optional basic HTTP auth > PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name > PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret > {code} > These credentials are sent in each Elasticsearch request as an HTTP Basic > Authorization header. > Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai > on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
Mars Hall created PIO-114: - Summary: Elasticsearch 5.x StorageClient basic HTTP authentication Key: PIO-114 URL: https://issues.apache.org/jira/browse/PIO-114 Project: PredictionIO Issue Type: New Feature Components: Core Affects Versions: 0.11.0-incubating Reporter: Mars Hall Assignee: Mars Hall Add optional username-password configuration for the new Elasticsearch 5 client; in {conf/pio-env.sh} config: {code:shell} # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret {code} ``` These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)