Hi
So I managed to fix this … took me a while to find out. So in case anyone cares:
...
In my code I had something like this:
def predict(model: ECommModel, query: Query): PredictedResult = {
val userFeatures = model.userFeatures
val productModels = model.productModels
…
}
val unavailableItems: Set[String] = try {
val constr = LEventStore.findByEntity(
appName = ap.sharedApp,
entityType = "constraint",
entityId = "unavailableItems"
…
}
So the idea was that the unavailable items only get populated once during the
deployment (and therefore to my understanding instantiation of my the
ECommAlgorithm class). Pulling the unavailable products in every incoming
request turned out to be too slow …
This worked in 0.10 but in 0.11 I was getting the „env vars not set“ errors.
Apparently something was changed that changes the scoping of the env vars in
the engines during testing.
Bests
Florian
Am 22. Mai 2017 um 13:58:05, Florian Krause ([email protected])
schrieb:
Hi Chan
thanks a lot for reaching out to me ...
pio@predict-io:/opt/reco-engine$ /opt/PredictionIO-0.11.0-incubating/bin/pio
status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at
/opt/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at
/opt/PredictionIO-0.11.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of
1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: PGSQL)...
[INFO] [Storage$] Verifying Model Data Backend (Source: PGSQL)...
[INFO] [Storage$] Verifying Event Data Backend (Source: PGSQL)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [Management$] Your system is all ready to go.
---
pio@predict-io:/opt/reco-engine/MatrixProduct2$
/opt/PredictionIO-0.11.0-incubating/bin/pio status --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at
/opt/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at
/opt/PredictionIO-0.11.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of
1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: PGSQL)...
[DEBUG] [ConnectionPool$] Registered connection pool :
ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio) using factory :
<default>
[DEBUG] [ConnectionPool$] Registered singleton connection pool :
ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio)
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
create table if not exists pio_meta_engineinstances ( id varchar(100) not
null primary key, status text not null, startTime timestamp DEFAULT
CURRENT_TIMESTAMP, endTime timestamp DEFAULT CURRENT_TIMESTAMP, engineId text
not null, engin
eVersion text not null, engineVariant text not null, engineFactory text not
null, batch text not null, env text not null, sparkConf text not null,
datasourceParams text not null, preparatorParams text not null,
algorithmsParams text not n
ull, servingParams text not null); (3 ms)
[Stack Trace]
...
org.apache.predictionio.data.storage.jdbc.JDBCEngineInstances$$anonfun$1.apply(JDBCEngineInstances.scala:49)
org.apache.predictionio.data.storage.jdbc.JDBCEngineInstances$$anonfun$1.apply(JDBCEngineInstances.scala:32)
scalikejdbc.DBConnection$class.autoCommit(DBConnection.scala:222)
scalikejdbc.DB.autoCommit(DB.scala:60)
scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:215)
scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:214)
scalikejdbc.LoanPattern$class.using(LoanPattern.scala:18)
scalikejdbc.DB$.using(DB.scala:138)
--
So this works .. building with tests enabled doesn't
---
/opt/PredictionIO-0.11.0-incubating/bin/pio build --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Engine$] Using command '/opt/PredictionIO-0.11.0-incubating/sbt/sbt' at
/opt/reco-engine/MatrixProduct2 to build.
[INFO] [Engine$] If the path above is incorrect, this process will fail.
[INFO] [Engine$] Uber JAR disabled. Making sure
lib/pio-assembly-0.11.0-incubating.jar is absent.
[INFO] [Engine$] Going to run: /opt/PredictionIO-0.11.0-incubating/sbt/sbt
package assemblyPackageDependency in /opt/reco-engine/MatrixProduct2
[INFO] [Engine$] [info] Loading project definition from
/opt/reco-engine/MatrixProduct2/project
[INFO] [Engine$] [info] Set current project to MatrixProduct2 (in build
file:/opt/reco-engine/MatrixProduct2/)
[INFO] [Engine$] [success] Total time: 0 s, completed May 22, 2017 11:52:26 AM
[INFO] [Engine$] [info] Including from cache: shared_2.11.jar
[INFO] [Engine$] [info] Including from cache: snappy-java-1.1.1.7.jar
[INFO] [Engine$] [info] Including from cache: scala-library-2.11.8.jar
[ERROR] [Engine$] log4j:WARN No appenders could be found for logger
(org.apache.predictionio.data.storage.Storage$).
[ERROR] [Engine$] log4j:WARN Please initialize the log4j system properly.
[ERROR] [Engine$] log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[INFO] [Engine$] org.apache.predictionio.data.storage.StorageClientException:
Data source PGSQL was not properly initialized.
[INFO] [Engine$] at
org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
[INFO] [Engine$] at
org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
[INFO] [Engine$] at scala.Option.getOrElse(Option.scala:121)
[INFO] [Engine$] at
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284)
[INFO] [Engine$] at
org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:269)
[INFO] [Engine$] at
org.apache.predictionio.data.storage.Storage$.getMetaDataApps(Storage.scala:387)
[INFO] [Engine$] at
org.apache.predictionio.data.store.Common$.appsDb$lzycompute(Common.scala:27)
[INFO] [Engine$] at
org.apache.predictionio.data.store.Common$.appsDb(Common.scala:27)
[INFO] [Engine$] at
org.apache.predictionio.data.store.Common$.appNameToId(Common.scala:32)
[INFO] [Engine$] at
org.apache.predictionio.data.store.LEventStore$.findByEntity(LEventStore.scala:75)
[INFO] [Engine$] at
com.rebelle.MatrixProduct2.ECommAlgorithm.liftedTree1$1(ECommAlgorithm.scala:516)
[INFO] [Engine$] at
com.rebelle.MatrixProduct2.ECommAlgorithm.<init>(ECommAlgorithm.scala:515)
[INFO] [Engine$] at
com.rebelle.MatrixProduct2.ECommAlgorithmTest.<init>(ECommAlgorithmTest.scala:31)
[INFO] [Engine$] at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[INFO] [Engine$] at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[INFO] [Engine$] at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[INFO] [Engine$] at
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
[INFO] [Engine$] at java.lang.Class.newInstance(Class.java:442)
[INFO] [Engine$] at
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:641)
[INFO] [Engine$] at sbt.TestRunner.runTest$1(TestFramework.scala:76)
[INFO] [Engine$] at sbt.TestRunner.run(TestFramework.scala:85)
I am using the EventStore in my recommender (to pull in products no longer
available). The test runner seems to instantiate it but then barfs because it
can't get the configuration from the env
Exactly the same engine compiles just fine under 0.10. When I disable the tests
with
test in assembly := {}
in the build.sbt file, compile, train and deploy works fine.
Bests
2017-05-22 12:49 GMT+02:00 Chan Lee <[email protected]>:
Hi Florian,
Can you tell me the output for `pio status`? Does the postgres driver match the
argument sent to spark-submit?
Best,
Chan
On Mon, May 22, 2017 at 1:53 AM, Florian Krause <[email protected]>
wrote:
Hi all
I have been unsuccessful at building my two engines with 0.11. I have described
my attempts here ->
https://stackoverflow.com/questions/43941915/predictionio-0-11-building-an-engine-fails-with-java-lang-classnotfoundexceptio
It appears that during the pio build phase, the env vars from pio-env.sh are
not set correctly.
I have managed to get around this by not running the tests, the compiled
versions of the engine work flawless, so the database works.
Now what confuses me a bit is the usage of the —env command line param in the
CreateWorkflow jar.
This is the command pio sends to spark
/opt/PredictionIO-0.11.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7/bin/spark-submit
--driver-memory 80G --executor-memory 80G --class
org.apache.predictionio.workflow.CreateWorkflow --jars
file:/opt/PredictionIO-0.11.0-incubating/lib/postgresql-42.1.1.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/mysql-connector-java-5.1.40-bin.jar,file:/opt/reco-engine/MatrixProduct2/target/scala-2.11/matrixproduct2_2.11-0.1-SNAPSHOT.jar,file:/opt/reco-engine/MatrixProduct2/target/scala-2.11/MatrixProduct2-assembly-0.1-SNAPSHOT-deps.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-localfs-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-jdbc-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-elasticsearch-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hbase-assembly-0.11.0-incubating.jar
--files file:/opt/PredictionIO-0.11.0-incubating/conf/log4j.properties
--driver-class-path
/opt/PredictionIO-0.11.0-incubating/conf:/opt/PredictionIO-0.11.0-incubating/lib/postgresql-42.1.1.jar:/opt/PredictionIO-0.11.0-incubating/lib/mysql-connector-java-5.1.40-bin.jar
--driver-java-options -Dpio.log.dir=/home/pio
file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar
--engine-id com.rebelle.MatrixProduct2.ECommerceRecommendationEngine
--engine-version 23bea44eff1a8e08bc80e290e52dc9dc565d9bb7 --engine-variant
file:/opt/reco-engine/MatrixProduct2/engine.json --verbosity 0 --json-extractor
Both --env
PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_HOME=/opt/PredictionIO-0.11.0-incubating,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=<password>,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/opt/PredictionIO-0.11.0-incubating/conf
When I try to run this manually from the command line, I get
[ERROR] [Storage$] Error initializing storage client for source
Exception in thread "main"
org.apache.predictionio.data.storage.StorageClientException: Data source was
not properly initialized.
at
org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
at
org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
at scala.Option.getOrElse(Option.scala:121)
at
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284)
So even though all needed params are set in —env, Spark cannot find them. I
have to manually set them via export to make this work. What exactly should
happen these vars are set through —env?
Perhaps someone can give me pointers in what might be worth trying
Bests & thanks
Florian
--
Dr. Florian Krause
Chief Technical Officer
____________________
REBELLE
- StyleRemains GmbH
Brooktorkai 4, D-20457 Hamburg
Tel.:
+49 40 30 70 19 18
Fax:
+49 40 30 70 19 29
E-Mail:
[email protected]
Website: www.rebelle.com
Network: LinkedIn
Xing
Managing directors: Sophie-Cécile Gaulke, Max Laurent
Schönemann
Registered in Amtsgericht Hamburg under the No. HRB
126796
This e-mail contains confidential and/or legally protected
information. If you are not the intended recipient or if you have
received this e-mail by error please notify the sender immediately
and destroy this e-mail. Any unauthorized review, copying,
disclosure or distribution of the material in this e-mail is
strictly forbidden. The contents of this e-mail is legally binding
only if it is confirmed by letter or fax. The sending of e-mails to
us does not have any period-protecting effect.