Re: PredictionIO with remote Spark and Elasticsearch

Miller, Clifford Thu, 02 Mar 2017 14:15:50 -0800

I probably should have asked if the elasticsearch 5.x compatible branch was
in a state that I could clone and build it.  If it is, where can I find it?


On Thu, Mar 2, 2017 at 5:06 PM, Miller, Clifford <
[email protected]> wrote:

> Actually, AWS has 3 current options.  1.5, 2.3, and 5.1.  So a 5.x
> compatible version should work.  When will this 5.x compatible version be
> available?
>
> On Thu, Mar 2, 2017 at 5:02 PM, Pat Ferrel <[email protected]> wrote:
>
>> Yes, PIO uses the TransportClient and this is being deprecated by ES. PIO
>> has a feature branch that adds support for ES5 using only the REST client.
>> Not sure this will help though since I suspect AWS is not on ES5 yet.
>>
>>
>> On Mar 2, 2017, at 1:10 PM, Miller, Clifford <
>> [email protected]> wrote:
>>
>> I found some old references of folks having the same issue as me.  They
>> indicated that the AWS Elasticsearch Service only supports HTTP and not
>> TCP.  If this is true then it means that AWS Elasticsearch has very limited
>> usefulness.  Has anyone else ran into this?
>>
>>
>> On Thu, Mar 2, 2017 at 1:26 PM, Miller, Clifford <
>> [email protected]> wrote:
>>
>>> I'm able run pio train although the pio train -- --master
>>> spark://your_master_url did not work.  I'm using Spark on Yarn so I was
>>> able to get pio train -- --master yarn://URL to work after I copied the
>>> elastic search configuration from my CDH cluster.
>>>
>>> I'm still struggling with integrating this with AWS elasticsearch.  Does
>>> anyone have an example of how this should be configured.
>>>
>>> FYI, the EC2 instance that I'm running PredictionIO on can access it
>>> from the command line: "curl -X GET <AWS Elasticsearch endpoint URL>".
>>>
>>>
>>> On Wed, Mar 1, 2017 at 11:44 AM, Donald Szeto <[email protected]> wrote:
>>>
>>>> Hi Clifford,
>>>>
>>>> To use a remote Spark cluster, use passthrough command line arguments
>>>> on the CLI, e.g.
>>>>
>>>> pio train -- --master spark://your_master_url
>>>>
>>>> Anything after a lone -- will be passed to spark-submit verbatim. For
>>>> more information try "pio help".
>>>>
>>>> To use a remote Elasticsearch cluster, please refer to examples in
>>>> "conf/pio-env.sh" where you could find a variable to set the remote host
>>>> name or IP of your ES cluster.
>>>>
>>>> Regards,
>>>> Donald
>>>>
>>>> On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford <
>>>> [email protected]> wrote:
>>>>
>>>>> I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on
>>>>> AWS.  I have PredictionIO installed on a different EC2 instance.  I've 
>>>>> been
>>>>> able to successfully configure it to use HDFS for model storage and to
>>>>> store events in Hbase from the cluster.  Spark and Elasticsearch are
>>>>> installed locally on the PredictionIO EC2 instance.  I have the following
>>>>> questions:
>>>>>
>>>>> How can I configure PredictionIO to utilize the Spark on the Cloudera
>>>>> cluster?
>>>>> How can I configure PredictionIO to utilize a remote Elasticsearch
>>>>> domain?  I'd like to use the AWS Elasticsearch service if possible.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> --
>>>>> Clifford Miller
>>>>> Mobile | 321.431.9089
>>>>>
>>>>
>>>
>>>
>>> --
>>> Clifford Miller
>>> Mobile | 321.431.9089
>>>
>>
>>
>>
>> --
>> Clifford Miller
>> Mobile | 321.431.9089
>>
>>
>
>
> --
> Clifford Miller
> Mobile | 321.431.9089
>



-- 
Clifford Miller
Mobile | 321.431.9089

Re: PredictionIO with remote Spark and Elasticsearch

Reply via email to