Those services are all running on the same (development) machine. I disabled 
the input event load and any other services and it is still taking forever 
(while using a single CPU). 

Could it be a partitioning issue? According logs each of the correlatorRDDs 
uses only a single partition…?

> On May 10, 2017, at 6:09 PM, Pat Ferrel <[email protected]> wrote:
> 
> What is the physical architecture? Do you have HBase, Elasticsearch, and 
> Spark running on separate machines? If the CPU load is low then it must be IO 
> bound reading from Hbase or writing to Elasticsearch. Do you have any input 
> event load yet or are you making queries? These will all change the equation 
> and are why separating services to run separately makes the most sense.
> 
> 
> On May 10, 2017, at 1:47 PM, Bolmo Joosten <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Thanks for your suggestion. I forgot to mention in my last email that the 
> $plus$plus stage takes most time (95%+) and is using only 1-3 CPUs.
> 
> I will give it a try with lower driver memory and higher executor memory.
> 
> Maybe a hard question, any idea what kind of training time I should expect 
> with this data size on this cluster? 
> 
> We modified the default UR template to create the eventRDDs from CSV files 
> instead of HBASE. Hbase was unable to process this amount of data on the 
> cluster. This means we can't provide any personalized recommendations, but 
> that is ok for now. 
> 
> 2017-05-10 10:22 GMT-07:00 Pat Ferrel <[email protected] 
> <mailto:[email protected]>>:
> You can’t bypass HBase, you can import JSON to HBase directly so I assume 
> this is what you are saying.
> 
> Executor memory should be higher and driver memory lower. Spark loves memory 
> and in this case the lower limit is all your input events and BiMaps for all 
> user and item ids. If you don’t have an OOM you are above minimum but 
> increasing the executor mem might help, also executor CPUs. The lower limit 
> for the driver mem is roughly equal to the amount per executor.
> 
> One unfortunate thing about Spark is that you can scale it to do the job in 
> minutes but when you go to read or write to/from HBase or Elasticsearch this 
> large a cluster will overload the DBs. So training in a long time is not all 
> that bad a thing since the cluster will probably not be overloading the IO.
> 
> 
> On May 10, 2017, at 8:45 AM, Bolmo Joosten <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Hi all,
> 
> I have trouble scaling the Universal Recommender to a dataset with 250M 
> events (purchase, view, atb). It trains ok on a couple of million events, but 
> the training time becomes very long (>48h) on the large dataset.
> 
> Hardware specs:
> 
> Standalone cluster
> 20 cores (40 hyper threading)
> 264GB RAM
> Input data size format:
> 
> We load directly from CSV files and bypass HBASE. Size of CSV is 19 GB.
> PIO JSON format equivalent size: 150 GB
> Train command:
> 
> pio train -- --driver-memory 64G --executor-memory 8G --executor-cores 2
> 
> I have used various variations with driver, executor memory and number of 
> cores, but the training time does not seem to be affected by this.
> 
> Spark UI tells me the save method (collect > $plus$plus) in URModel.scala 
> takes a very long time. See attached dumps of the Spark UI for details. 
> 
> Any suggestions?
> 
> Thanks, Bolmo
> 
> 
> 
> 
> 
> 
> 
> 

Reply via email to