On Mar 13, 2017, at 9:05 PM, Lin Amy <[email protected]> wrote: > > Hello Pat, > > I am using two UR because one of them simply using popularity as > recommendation while another is the normal version. Both of two engines use > same data (app), about 10G.
The same model can serve both “recommendations” and “popular”. The simplest form of this is to train with all data, and query for recs passing no user or item in with the query, this will give you popular results. Then the same model and engine can be queries with a user and/or item for personalized or item-based recs. In the soon to be released 0.6.0 you can also have item-set recommendations from the same model. No need at all for 2 URs to solve this case and these will cause 2 separate Processes to startup, these are moderately heavy-weight compared to one process using the method above. > I have two machines running pio, one only runs ElasticSearch as slave, and > another runs all (Hbase, ES, Spark), so I am running `pio-start-all` on the > second machine. Also the two machines both contains 8 cores and 16G memory The problem with this is that Spark has 2 parts, driver and worker, both of these need a lot of memory and they are both running on a single machine with only 16g memory for Spark Driver, Spark Worker, HBase, and HDFS. Spark, for a 10g dataset could need most of your available memory split between driver and worker in equal portions. For an overview of how Spark works see: http://actionml.com/docs/intro_to_spark <http://actionml.com/docs/intro_to_spark> Since Spark is really only needed during training you may want another machine for the Spark Worker if you continue to have trouble running all this on one machine. > You can find the two engine.json as attached. > Thank you for helping! > > Best regards, > Amy > > Pat Ferrel <[email protected] <mailto:[email protected]>> 於 > 2017年3月14日 週二 上午1:30寫道: > If you are running pio-start-all you must be running everything on a single > machine. This is called vertical scaling and is very prone to running out of > resources, either compute cores, or memory. If it has been running for some > time you may have finally hit the limit if what you can do on the machine. > > What are the machines scecs, cores, memory? What is the size of you data? > Have you exported it with `pio export`? > > Also do you have a different indexName in the 2 engine.json files? And why > have 2 URs? > > > > On Mar 12, 2017, at 8:58 PM, Lin Amy <[email protected] > <mailto:[email protected]>> wrote: > > Hello everyone, > > I got two universal recommendation engine using the same events. And this > morning I find the server busy running with 100% CPU, so I shut it down, > tried to run up all the server. > However, after `pio-start-all` succeeded, I ran `pio train` on the two > engines, one succeeded with another failed. It returns the following error > message: > > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 1 in stage 47.0 failed 1 times, most recent failure: > Lost task 1.0 in stage 47.0 (TID 156, localhost): > org.apache.spark.util.TaskCompletionListenerException: Found unrecoverable > error [10.1.3.100:9200 <http://10.1.3.100:9200/>] returned Bad Request(400) - > [MapperParsingException[failed to parse [t]]; nested: > ElasticsearchIllegalArgumentException[unknown property [obj]]; ]; Bailing > out.. > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:112) > at org.apache.spark.scheduler.Task.run(Task.scala:102) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Any advise on how to solve the weird situation? > Thank you! > > Best regards, > Amy > > <normal_version_engine.json><popularity_engine.json> >
