If you are running pio-start-all you must be running everything on a single 
machine. This is called vertical scaling and is very prone to running out of 
resources, either compute cores, or memory. If it has been running for some 
time you may have finally hit the limit if what you can do on the machine. 

What are the machines scecs, cores, memory? What is the size of you data? Have 
you exported it with `pio export`? 

Also do you have a different indexName in the 2 engine.json files? And why have 
2 URs?


On Mar 12, 2017, at 8:58 PM, Lin Amy <[email protected]> wrote:

Hello everyone,

I got two universal recommendation engine using the same events. And this 
morning I find the server busy running with 100% CPU, so I shut it down, tried 
to run up all the server.
However, after `pio-start-all` succeeded, I ran `pio train` on the two engines, 
one succeeded with another failed. It returns the following error message:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 1 in stage 47.0 failed 1 times, most recent failure: Lost 
task 1.0 in stage 47.0 (TID 156, localhost): 
org.apache.spark.util.TaskCompletionListenerException: Found unrecoverable 
error [10.1.3.100:9200 <http://10.1.3.100:9200/>] returned Bad Request(400) - 
[MapperParsingException[failed to parse [t]]; nested: 
ElasticsearchIllegalArgumentException[unknown property [obj]]; ]; Bailing out..
        at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:112)
        at org.apache.spark.scheduler.Task.run(Task.scala:102)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Any advise on how to solve the weird situation?
Thank you!

Best regards,
Amy

Reply via email to