Those services are all running on the same (development) machine. I disabled the input event load and any other services and it is still taking forever (while using a single CPU).
Could it be a partitioning issue? According logs each of the correlatorRDDs uses only a single partition…? > On May 10, 2017, at 6:09 PM, Pat Ferrel <[email protected]> wrote: > > What is the physical architecture? Do you have HBase, Elasticsearch, and > Spark running on separate machines? If the CPU load is low then it must be IO > bound reading from Hbase or writing to Elasticsearch. Do you have any input > event load yet or are you making queries? These will all change the equation > and are why separating services to run separately makes the most sense. > > > On May 10, 2017, at 1:47 PM, Bolmo Joosten <[email protected] > <mailto:[email protected]>> wrote: > > Thanks for your suggestion. I forgot to mention in my last email that the > $plus$plus stage takes most time (95%+) and is using only 1-3 CPUs. > > I will give it a try with lower driver memory and higher executor memory. > > Maybe a hard question, any idea what kind of training time I should expect > with this data size on this cluster? > > We modified the default UR template to create the eventRDDs from CSV files > instead of HBASE. Hbase was unable to process this amount of data on the > cluster. This means we can't provide any personalized recommendations, but > that is ok for now. > > 2017-05-10 10:22 GMT-07:00 Pat Ferrel <[email protected] > <mailto:[email protected]>>: > You can’t bypass HBase, you can import JSON to HBase directly so I assume > this is what you are saying. > > Executor memory should be higher and driver memory lower. Spark loves memory > and in this case the lower limit is all your input events and BiMaps for all > user and item ids. If you don’t have an OOM you are above minimum but > increasing the executor mem might help, also executor CPUs. The lower limit > for the driver mem is roughly equal to the amount per executor. > > One unfortunate thing about Spark is that you can scale it to do the job in > minutes but when you go to read or write to/from HBase or Elasticsearch this > large a cluster will overload the DBs. So training in a long time is not all > that bad a thing since the cluster will probably not be overloading the IO. > > > On May 10, 2017, at 8:45 AM, Bolmo Joosten <[email protected] > <mailto:[email protected]>> wrote: > > Hi all, > > I have trouble scaling the Universal Recommender to a dataset with 250M > events (purchase, view, atb). It trains ok on a couple of million events, but > the training time becomes very long (>48h) on the large dataset. > > Hardware specs: > > Standalone cluster > 20 cores (40 hyper threading) > 264GB RAM > Input data size format: > > We load directly from CSV files and bypass HBASE. Size of CSV is 19 GB. > PIO JSON format equivalent size: 150 GB > Train command: > > pio train -- --driver-memory 64G --executor-memory 8G --executor-cores 2 > > I have used various variations with driver, executor memory and number of > cores, but the training time does not seem to be affected by this. > > Spark UI tells me the save method (collect > $plus$plus) in URModel.scala > takes a very long time. See attached dumps of the Spark UI for details. > > Any suggestions? > > Thanks, Bolmo > > > > > > > >
