Re: UniversalRecommender performance bottleneck in SimilarityAnalysis.cooccurrencesIDSs

2016-11-19 Thread Pat Ferrel
The current head of the template repo repartitions input based on Spark's 
default parallelism, which I set on the `pio train` CLI to 4 x #-of-cores. This 
speeds up the math drastically. There are still some things that look like 
bottlenecks but taking them out make things slower. The labels you see in the 
Spark GUI should be considered approximations.

The parOpt is a mahout specific way to control partitioning and I avoid it by 
using the Spark method. 


On Nov 16, 2016, at 5:56 AM, Igor Kasianov  wrote:

Hi,

I'm using UR template and have some trouble with scalability.

Training take 18hours (each day) and last 12 hours it use only one core.
As I can see URAlgorithm.scala (line 144) call 
SimilarityAnalysis.cooccurrencesIDSs
with data.actions (12 partitions)

untill reduceByKey in AtB.scala it executes in parallel
but after this it executing in single thread.

It is strange, that when SimilarityAnalysis.scala(line 145) call
indexedDatasets(0).create(drm, indexedDatasets(0).columnIDs, 
indexedDatasets(i).columnIDs)
it return IndexedDataset with only one partition.

As I can see in SimilarityAnalysis.scala(line 63)
drmARaw.par(auto = true)
May be this cause decreasing the number of partitions.
As I can see in master branch of MAHOUT
has ParOpt:
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/cf/SimilarityAnalysis.scala#L142
 

May be this can fix the problem.

So, am I right with root of problems, and how can I fix it?



I have spark cluster with 12 Cores and 128GB but with increasing number of 
events, I can't scale UR, beause of this bottleneck

P.S., please do not suggest to use event window (I've already use it. but daily 
numer of events are increasing)



Re: Regarding Remote ES Cluster with Pio

2016-11-19 Thread Harsh Mathur
Hi,
Thanks man:)

After some of hit and trial, I changed https to http in the url and put the
java native client port, it worked without any auth.

Thanks again:)

Regards
Harsh Mathur
harshmathur.1...@gmail.com

*“Perseverance is the hard work you do after you get tired of doing the
hard work you already did."*

On Sat, Nov 19, 2016 at 2:18 PM, Hasan Can Saral 
wrote:

> Hi!
>
> There might be an issue with basic auth. I have not tried to configure pio
> with an ES server with basic auth. And from the error you get, I understand
> that pio does not seem to be happy with (or even find) the hosts you
> provided. Also what port is your ES cluster listening to? Can you try 9300
> and 9200 explicitly?
>
>
> On Nov 17, 2016, at 5:26 PM, Harsh Mathur 
> wrote:
>
> Hi PredictionIO developers,
> First of all Thank you for a great open source product.
>
> I am Harsh, I was deploying the system in production and I have an ES
> instance as a managed service. I am not able to make pio use my managed es
> instance instead of me installing a local es. Thanks a lot for all the help
> in advance.
>
> I have a ES config in form: https://user:password@host
> ports available:
> 1. x: for http
> 2. y: for native java node clients
>
> I tried editing pio-env.sh as follows:
>
> # Elasticsearch Example
> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=https://user:password@host
> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=native_java_port
> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/
> vendors/elasticsearch-1.5.2
>
>
> But Pio is not bale to find any nodes:
>
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
>
> [WARN] [netty] [Aftershock] exception caught on transport layer [[id:
> 0x63808344]], closing connection
>
> [ERROR] [Console$] Unable to connect to all storage backends successfully.
> The following shows the error message from the storage backend.
>
> [ERROR] [Console$] None of the configured nodes are available: []
> (org.elasticsearch.client.transport.NoNodeAvailableException)
>
> [ERROR] [Console$] Dumping configuration of initialized storage backend
> sources. Please make sure they are correct.
> Regards
> Harsh Mathur
> harshmathur.1...@gmail.com
>
> *“Perseverance is the hard work you do after you get tired of doing the
> hard work you already did."*
>
>
>


Re: Regarding Remote ES Cluster with Pio

2016-11-19 Thread Hasan Can Saral
Hi!

There might be an issue with basic auth. I have not tried to configure pio with 
an ES server with basic auth. And from the error you get, I understand that pio 
does not seem to be happy with (or even find) the hosts you provided. Also what 
port is your ES cluster listening to? Can you try 9300 and 9200 explicitly?


> On Nov 17, 2016, at 5:26 PM, Harsh Mathur  wrote:
> 
> Hi PredictionIO developers,
> First of all Thank you for a great open source product.
> 
> I am Harsh, I was deploying the system in production and I have an ES 
> instance as a managed service. I am not able to make pio use my managed es 
> instance instead of me installing a local es. Thanks a lot for all the help 
> in advance.
> 
> I have a ES config in form: https://user:password@host
> ports available:
> 1. x: for http
> 2. y: for native java node clients
> 
> I tried editing pio-env.sh as follows:
> 
> # Elasticsearch Example
> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=https://user:password@host
> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=native_java_port
> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.5.2
> 
> 
> But Pio is not bale to find any nodes:
> 
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> 
> [WARN] [netty] [Aftershock] exception caught on transport layer [[id: 
> 0x63808344]], closing connection
> 
> [ERROR] [Console$] Unable to connect to all storage backends successfully. 
> The following shows the error message from the storage backend.
> 
> [ERROR] [Console$] None of the configured nodes are available: [] 
> (org.elasticsearch.client.transport.NoNodeAvailableException)
> 
> [ERROR] [Console$] Dumping configuration of initialized storage backend 
> sources. Please make sure they are correct.
> 
> Regards
> Harsh Mathur
> harshmathur.1...@gmail.com 
> 
> “Perseverance is the hard work you do after you get tired of doing the hard 
> work you already did."
>