Yes, PIO uses the TransportClient and this is being deprecated by ES. PIO has a 
feature branch that adds support for ES5 using only the REST client. Not sure 
this will help though since I suspect AWS is not on ES5 yet.


On Mar 2, 2017, at 1:10 PM, Miller, Clifford 
<[email protected]> wrote:

I found some old references of folks having the same issue as me.  They 
indicated that the AWS Elasticsearch Service only supports HTTP and not TCP.  
If this is true then it means that AWS Elasticsearch has very limited 
usefulness.  Has anyone else ran into this?


On Thu, Mar 2, 2017 at 1:26 PM, Miller, Clifford 
<[email protected] 
<mailto:[email protected]>> wrote:
I'm able run pio train although the pio train -- --master 
spark://your_master_url did not work.  I'm using Spark on Yarn so I was able to 
get pio train -- --master yarn://URL to work after I copied the elastic search 
configuration from my CDH cluster.

I'm still struggling with integrating this with AWS elasticsearch.  Does anyone 
have an example of how this should be configured.  

FYI, the EC2 instance that I'm running PredictionIO on can access it from the 
command line: "curl -X GET <AWS Elasticsearch endpoint URL>". 
 

On Wed, Mar 1, 2017 at 11:44 AM, Donald Szeto <[email protected] 
<mailto:[email protected]>> wrote:
Hi Clifford,

To use a remote Spark cluster, use passthrough command line arguments on the 
CLI, e.g.

pio train -- --master spark://your_master_url

Anything after a lone -- will be passed to spark-submit verbatim. For more 
information try "pio help".

To use a remote Elasticsearch cluster, please refer to examples in 
"conf/pio-env.sh" where you could find a variable to set the remote host name 
or IP of your ES cluster.

Regards,
Donald

On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford 
<[email protected] 
<mailto:[email protected]>> wrote:
I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on AWS.  I 
have PredictionIO installed on a different EC2 instance.  I've been able to 
successfully configure it to use HDFS for model storage and to store events in 
Hbase from the cluster.  Spark and Elasticsearch are installed locally on the 
PredictionIO EC2 instance.  I have the following questions:

How can I configure PredictionIO to utilize the Spark on the Cloudera cluster?  
How can I configure PredictionIO to utilize a remote Elasticsearch domain?  I'd 
like to use the AWS Elasticsearch service if possible.

Thanks


-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>



-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>



-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

Reply via email to