If you are running pio-start-all you must be running everything on a single machine. This is called vertical scaling and is very prone to running out of resources, either compute cores, or memory. If it has been running for some time you may have finally hit the limit if what you can do on the machine.
What are the machines scecs, cores, memory? What is the size of you data? Have you exported it with `pio export`? Also do you have a different indexName in the 2 engine.json files? And why have 2 URs? On Mar 12, 2017, at 8:58 PM, Lin Amy <[email protected]> wrote: Hello everyone, I got two universal recommendation engine using the same events. And this morning I find the server busy running with 100% CPU, so I shut it down, tried to run up all the server. However, after `pio-start-all` succeeded, I ran `pio train` on the two engines, one succeeded with another failed. It returns the following error message: Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 47.0 failed 1 times, most recent failure: Lost task 1.0 in stage 47.0 (TID 156, localhost): org.apache.spark.util.TaskCompletionListenerException: Found unrecoverable error [10.1.3.100:9200 <http://10.1.3.100:9200/>] returned Bad Request(400) - [MapperParsingException[failed to parse [t]]; nested: ElasticsearchIllegalArgumentException[unknown property [obj]]; ]; Bailing out.. at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:112) at org.apache.spark.scheduler.Task.run(Task.scala:102) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Any advise on how to solve the weird situation? Thank you! Best regards, Amy
