Hello,
I am running 3 predicition io applications on a single machine (4vCPU & 16GB
RAM) where I feed the engines with data on a daily basis and then train them
and deploy them.
I have 2 engines that work on the universal recommender template and the third
one use the similarProduct template.
The first two applications running on the universal recommender were deployed 5
months ago, and everything was running smoothly. There were no errors and no
bugs. 5 days ago, we deployed the third application running the similarProduct
template. Right after that deployment, we started noticing a malfunction in the
first two applications. In fact, when sending a request with some input to the
recommender, the recommender responds with correct data, but generates a
timeout error for requests with different input with the following message (The
server was not able to produce a timely response to your request.)
This message only shows up for particular users input. In more details, for
specific input, the recommender always replies with data, while for other
input, the recommender always generates a timeout error. When the error occurs,
I get the following error log on the machine:
Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at
scala.collection.mutable.StringBuilder.toString(StringBuilder.scala:430)
at
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:323)
at scala.collection.immutable.Stream.toString(Stream.scala:823)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at scala.StringContext.standardInterpolator(StringContext.scala:125)
at scala.StringContext.s(StringContext.scala:95)
at
com.actionml.URAlgorithm$$anonfun$buildQuery$3.apply(URAlgorithm.scala:576)
at
com.actionml.URAlgorithm$$anonfun$buildQuery$3.apply(URAlgorithm.scala:576)
at grizzled.slf4j.Logger.info(slf4j.scala:128)
at com.actionml.URAlgorithm.buildQuery(URAlgorithm.scala:576)
at com.actionml.URAlgorithm.predict(URAlgorithm.scala:488)
at com.actionml.URAlgorithm.predict(URAlgorithm.scala:180)
at
org.apache.predictionio.controller.P2LAlgorithm.predictBase(P2LAlgorithm.scala:73)
Note that we have monitored the CPU, memory, and disk metrics of the machine.
Everything seems working well. We have a lot of free memory and disk space, and
the CPU utilization is in normal levels.
Is there anyway I can fix or debug what is happening? Any help would be greatly
appreciated.
Regards,
Sami Serbey