Re: PredictionIO with remote Spark and Elasticsearch

Pat Ferrel Thu, 02 Mar 2017 14:45:31 -0800

1) circumvent what?
2) transportclient port to what?

On Mar 2, 2017, at 2:04 PM, Paul-Armand Verhaegen 
<[email protected]> wrote:


We went to elastic.co <http://elastic.co/> to circumvent that. They are also on 
AWS but have the transportclient port.


> On 2 Mar 2017, at 23:02, Pat Ferrel <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Yes, PIO uses the TransportClient and this is being deprecated by ES. PIO has 
> a feature branch that adds support for ES5 using only the REST client. Not 
> sure this will help though since I suspect AWS is not on ES5 yet.
> 
> 
> On Mar 2, 2017, at 1:10 PM, Miller, Clifford 
> <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> I found some old references of folks having the same issue as me.  They 
> indicated that the AWS Elasticsearch Service only supports HTTP and not TCP.  
> If this is true then it means that AWS Elasticsearch has very limited 
> usefulness.  Has anyone else ran into this?
> 
> 
> On Thu, Mar 2, 2017 at 1:26 PM, Miller, Clifford 
> <[email protected] 
> <mailto:[email protected]>> wrote:
> I'm able run pio train although the pio train -- --master 
> spark://your_master_url <spark://your_master_url> did not work.  I'm using 
> Spark on Yarn so I was able to get pio train -- --master yarn://URL 
> <yarn://URL> to work after I copied the elastic search configuration from my 
> CDH cluster.
> 
> I'm still struggling with integrating this with AWS elasticsearch.  Does 
> anyone have an example of how this should be configured.  
> 
> FYI, the EC2 instance that I'm running PredictionIO on can access it from the 
> command line: "curl -X GET <AWS Elasticsearch endpoint URL>". 
>  
> 
> On Wed, Mar 1, 2017 at 11:44 AM, Donald Szeto <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Clifford,
> 
> To use a remote Spark cluster, use passthrough command line arguments on the 
> CLI, e.g.
> 
> pio train -- --master spark://your_master_url <spark://your_master_url>
> 
> Anything after a lone -- will be passed to spark-submit verbatim. For more 
> information try "pio help".
> 
> To use a remote Elasticsearch cluster, please refer to examples in 
> "conf/pio-env.sh" where you could find a variable to set the remote host name 
> or IP of your ES cluster.
> 
> Regards,
> Donald
> 
> On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford 
> <[email protected] 
> <mailto:[email protected]>> wrote:
> I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on AWS.  I 
> have PredictionIO installed on a different EC2 instance.  I've been able to 
> successfully configure it to use HDFS for model storage and to store events 
> in Hbase from the cluster.  Spark and Elasticsearch are installed locally on 
> the PredictionIO EC2 instance.  I have the following questions:
> 
> How can I configure PredictionIO to utilize the Spark on the Cloudera 
> cluster?  
> How can I configure PredictionIO to utilize a remote Elasticsearch domain?  
> I'd like to use the AWS Elasticsearch service if possible.
> 
> Thanks
> 
> 
> -- 
> Clifford Miller
> Mobile | 321.431.9089 <tel:321.431.9089>
> 
> 
> 
> -- 
> Clifford Miller
> Mobile | 321.431.9089 <tel:321.431.9089>
> 
> 
> 
> -- 
> Clifford Miller
> Mobile | 321.431.9089 <tel:321.431.9089>
>

Re: PredictionIO with remote Spark and Elasticsearch

Reply via email to