Where does the classpath in spark-submit originate? Is compute-classpath.sh
not the source?

As noted previously, the stable-ordering fix by me in compute-classpath.sh
no longer seems to be effective either.

Looks like some tracing of classpath assembly through the Spark command
runner is required:
https://github.com/apache/predictionio/blob/develop/tools/src/main/scala/org/apache/predictionio/tools/Runner.scala#L185

Unless someone with more knowledge of these internals could weigh-in…
Donald? 😬😊

On Fri, Mar 9, 2018 at 15:44 Shane Johnson <[email protected]> wrote:

> One additional item that you mentioned earlier is that we would need to
> remove or skip the aws-java-sdk.jar that is already in the CLASSPATH. Do
> you think this has impact? I did not write anything to skip or remove the
> existing aws-java-sdk.jar.
>
> aws-java-sdk.jar is already in the CLASSPATH though, So, the script will
>> need to skip or remove it first.
>
>
> *Shane Johnson | LIFT IQ*
> *Founder | CEO*
>
> *www.liftiq.com <http://www.liftiq.com/>* or *[email protected]
> <[email protected]>*
> mobile: (801) 360-3350
> LinkedIn <https://www.linkedin.com/in/shanewjohnson/>  |  Twitter
> <https://twitter.com/SWaldenJ> |  Facebook
> <https://www.facebook.com/shane.johnson.71653>
>
>
>
> On Fri, Mar 9, 2018 at 4:41 PM, Shane Johnson <[email protected]> wrote:
>
>> Now that I am able to deploy I reset the buildpack to
>> ...#debug-custom-dist and redeployed. Here is the build log...URL does
>> point to the correct distribution with the edited compute-classpath.sh file.
>>
>> -----> JVM Common app detected
>>
>> -----> Installing JDK 1.8... done
>>
>> -----> PredictionIO app detected
>>
>> -----> Install core components
>>
>>        + PredictionIO 
>> (https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz)
>>
>>        + Spark (spark-2.1.1-bin-hadoop2.7)
>>
>> -----> Install supplemental components
>>
>>        + PostgreSQL (JDBC)
>>
>>        + S3 HDFS (AWS SDK)
>>
>>        + S3 HDFS (Hadoop-AWS)
>>
>>          Writing default 'core-site.xml.erb'
>>
>>        + local Maven repo from buildpack (contents)
>>
>> -----> Configure PredictionIO
>>
>>        Writing default 'pio-env.sh'
>>
>>        Writing default 'spark-defaults.conf.erb'
>>
>>        + Maven repo from buildpack (build.sbt entry)
>>
>>        Set-up environment via '.profile.d/' scripts
>>
>> -----> Install JVM (heroku/jvm-common)
>>
>> -----> PredictionIO engine
>>
>>        Quietly logging. (Set `PIO_VERBOSE=true` for detailed build log.)
>>
>>        [INFO] [Engine$] Using command 
>> '/tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/PredictionIO-dist/sbt/sbt'
>>  at 
>> /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0
>>  to build.
>>
>>        [INFO] [Engine$] If the path above is incorrect, this process will 
>> fail.
>>
>>        [INFO] [Engine$] Uber JAR disabled. Making sure 
>> lib/pio-assembly-0.12.0-incubating.jar is absent.
>>
>>        [INFO] [Engine$] Going to run: 
>> /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/PredictionIO-dist/sbt/sbt
>>   package assemblyPackageDependency in 
>> /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0
>>
>>        [INFO] [Engine$] Compilation finished successfully.
>>
>>        [INFO] [Engine$] Looking for an engine...
>>
>>        [INFO] [Engine$] Found 
>> template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar
>>
>>        [INFO] [Engine$] Found 
>> template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar
>>
>>        [INFO] [Engine$] Build finished successfully.
>>
>>        [INFO] [Pio$] Your engine is ready for training.
>>
>>        Using default Procfile for engine
>>
>> -----> Discovering process types
>>
>>        Procfile declares types -> release, train, web
>>
>> -----> Compressing...
>>
>>        Done: 376.7M
>>
>>
>> The release log is below...I am not seeing the  
>> */app/PredictionIO-dist/lib/spark/aws-java-sdk.jar
>> *show up at the beginning of the CLASSPATH, this is what we should see
>> correct? I was also manipulating the compute-classpath.sh locally as well,
>> I observed that adding a line right before echo "$CLASSPATH" was not
>> changing what was in the logged spark-submit command as an FYI. This is
>> what I was testing locally...
>>
>>
>> CLASSPATH="*/Users/shanejohnson/Desktop/Apps/liftiq_platform/lift-s*
>> *core/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*
>> :$CLASSPATH"
>> echo "$CLASSPATH"
>>
>> I did not see any change in the spark-submit command by adding this when
>> building and deploying locally.
>>
>> Release Log with new buildpack ..#debug-custom-dist
>>
>> Running train on release…
>>
>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8
>>
>> [INFO] [Runner$] Submission command: 
>> /app/PredictionIO-dist/vendors/spark-hadoop/bin/spark-submit --driver-memory 
>> 13g --class org.apache.predictionio.workflow.CreateWorkflow --jars 
>> file:/app/PredictionIO-dist/lib/postgresql_jdbc.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-s3-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-hdfs-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*,file:/app/PredictionIO-dist/lib/spark/pio-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-jdbc-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/hadoop-aws.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hdfs-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar*
>>  --files 
>> file:/app/PredictionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist/conf/core-site.xml
>>  --driver-class-path 
>> /app/PredictionIO-dist/conf:/app/PredictionIO-dist/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/PredictionIO-dist/conf
>>  --driver-java-options -Dpio.log.dir=/app 
>> file:/app/PredictionIO-dist/lib/pio-assembly-0.12.0-incubating.jar 
>> --engine-id org.template.liftscoring.LiftScoringEngine --engine-version 
>> 0c35eebf403cf91fe77a64921d76aa1ca6411d20 --engine-variant 
>> file:/app/engine.json --verbosity 0 --json-extractor Both --env 
>> PIO_ENV_LOADED=1,PIO_EVENTSERVER_APP_NAME=classi,PIO_STORAGE_SOURCES_PGSQL_INDEX=enabled,PIO_S3_AWS_ACCESS_KEY_ID=AKIAJJX2S55QPCPZXGFQ,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/app/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_S3_BUCKET_NAME=lift-model-devmaster,PIO_EVENTSERVER_ACCESS_KEY=5954-20848-7512-17427-21660,PIO_HOME=/app/PredictionIO-dist,PIO_FS_ENGINESDIR=/app/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://ec2-52-70-46-243.compute-1.amazonaws.com:5432/dbvbo86hohutvb?sslmode=require,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_SPARK_OPTS=--driver-memory
>>  13g 
>> ,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=p5c404ac780ab517d4ab249d7000809b51b4b987fdfb5c26e1bace511130337ac,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/app/PredictionIO-dist/vendors/elasticsearch,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/app/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=ubefhv668b1s1m,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http,PIO_S3_AWS_SECRET_ACCESS_KEY=tQwL1PgYR0Y5MHG+qwVgEXNEcDcdlupaN2oO6JuR,PIO_TRAIN_SPARK_OPTS=--driver-memory
>>  13g 
>> ,PIO_STORAGE_SOURCES_PGSQL_CONNECTIONS=8,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/app/PredictionIO-dist/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_PGSQL_PARTITIONS=4,PIO_S3_AWS_REGION=us-east-1
>>
>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8
>>
>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8
>>
>> [INFO] [Engine] Extracting datasource params...
>>
>> [INFO] [Engine] Datasource params: (,DataSourceParams(Some(5)))
>>
>> [INFO] [Engine] Extracting preparator params...
>>
>> [WARN] [WorkflowUtils$] Non-empty parameters supplied to 
>> org.template.liftscoring.Preparator, but its constructor does not accept any 
>> arguments. Stubbing with empty parameters.
>>
>> [INFO] [Engine] Preparator params: (,Empty)
>>
>>
>>
>>
>>
>> *Shane Johnson | LIFT IQ*
>> *Founder | CEO*
>>
>> *www.liftiq.com <http://www.liftiq.com/>* or *[email protected]
>> <[email protected]>*
>> mobile: (801) 360-3350
>> LinkedIn <https://www.linkedin.com/in/shanewjohnson/>  |  Twitter
>> <https://twitter.com/SWaldenJ> |  Facebook
>> <https://www.facebook.com/shane.johnson.71653>
>>
>>
>>
>> On Fri, Mar 9, 2018 at 11:17 AM, Mars Hall <[email protected]>
>> wrote:
>>
>>> I'm lost as to how such direct manipulation of CLASSPATH is not
>>> appearing in the logged spark-submit command.
>>>
>>> What could cause this!?
>>>
>>> I just pushed a version of the buildpack which should help debug.
>>> Assuming only a single buildpack is assigned to the app, here's how to set
>>> it:
>>>
>>>   heroku buildpacks:set
>>> https://github.com/heroku/predictionio-buildpack#debug-custom-dist
>>>
>>> Then redeploy the engine an check the build log for the line:
>>>
>>>       + PredictionIO ($URL)
>>>
>>> Please confirm that it is the URL of your custom PredictionIO dist.
>>>
>>> On Fri, Mar 9, 2018 at 2:47 PM, Shane Johnson <[email protected]> wrote:
>>>
>>>> Thanks Donald and Mars,
>>>>
>>>> I created a new distribution (
>>>> <https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz>
>>>> https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz)
>>>> with the added CLASSPATH code and pointed to the distribution with
>>>> the PREDICTIONIO_DIST_URL variable within the engine app in Heroku.
>>>>
>>>> CLASSPATH="/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar
>>>> :$CLASSPATH"
>>>> echo "$CLASSPATH"
>>>>
>>>> It didn't seem to force the aws-java-sdk to load first as I reviewed
>>>> the release logs. Should the aws-java-sdk.jar show up as the first file
>>>> within the --jars section when this is added CLASSPATH="
>>>> /app/PredictionIO-dist/lib/spark/aws-java-sdk.jar:$CLASSPATH".
>>>>
>>>> I'm still getting the NoSuchMethodError when the *aws-java-sdk.jar* loads
>>>> after the *pio-data-s3-assembly-0.12.0-incubating.jar**. *Do you have
>>>> other suggestions to try? I was also testing locally to change the order of
>>>> the --jars but changes to the compute-classpath.sh didn't seem to change
>>>> the order of the jars in the logs.
>>>>
>>>> Running train on release…
>>>>
>>>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8
>>>>
>>>> [INFO] [Runner$] Submission command: 
>>>> /app/PredictionIO-dist/vendors/spark-hadoop/bin/spark-submit 
>>>> --driver-memory 13g --class 
>>>> org.apache.predictionio.workflow.CreateWorkflow --jars 
>>>> file:/app/PredictionIO-dist/lib/postgresql_jdbc.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hdfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/hadoop-aws.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hbase-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*,file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar*
>>>>  --files 
>>>> file:/app/PredictionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist/conf/core-site.xml
>>>>  --driver-class-path 
>>>> /app/PredictionIO-dist/conf:/app/PredictionIO-dist/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/PredictionIO-dist/conf
>>>>  --driver-java-options -Dpio.log.dir=/app 
>>>> file:/app/PredictionIO-dist/lib/pio-assembly-0.12.0-incubating.jar 
>>>> --engine-id org.template.liftscoring.LiftScoringEngine --engine-version 
>>>> 0c35eebf403cf91fe77a64921d76aa1ca6411d20 --engine-variant 
>>>> file:/app/engine.json --verbosity 0 --json-extractor Both --env
>>>>
>>>>
>>>> Error:
>>>>
>>>> Exception in thread "main" java.lang.NoSuchMethodError: 
>>>> com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
>>>>
>>>>    at 
>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
>>>>
>>>>    at 
>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
>>>>
>>>>    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Shane Johnson | LIFT IQ*
>>>> *Founder | CEO*
>>>>
>>>> *www.liftiq.com <http://www.liftiq.com/>* or *[email protected]
>>>> <[email protected]>*
>>>> mobile: (801) 360-3350
>>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson/>  |  Twitter
>>>> <https://twitter.com/SWaldenJ> |  Facebook
>>>> <https://www.facebook.com/shane.johnson.71653>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 7, 2018 at 1:01 PM, Mars Hall <[email protected]>
>>>> wrote:
>>>>
>>>>> Shane,
>>>>>
>>>>> On Wed, Mar 7, 2018 at 4:49 AM, Shane Johnson <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Re: adding a line to ensure a jar is loaded first. Is this what you
>>>>>> are referring to...(line at the bottom in red)?
>>>>>>
>>>>>
>>>>>
>>>>> I believe the code would need to look like this to effect the output
>>>>> classpath as intended:
>>>>>
>>>>>
>>>>>> CLASSPATH="/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar
>>>>>> :$CLASSPATH"
>>>>>> echo "$CLASSPATH"
>>>>>>
>>>>>
>>>>>
>>>>> aws-java-sdk.jar is already in the CLASSPATH though, So, the script
>>>>> will need to be skip or remove it first.
>>>>>
>>>>> --
>>>>> *Mars Hall
>>>>> 415-818-7039 <(415)%20818-7039>
>>>>> Customer Facing Architect
>>>>> Salesforce Platform / Heroku
>>>>> San Francisco, California
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Mars Hall
>>> 415-818-7039 <(415)%20818-7039>
>>> Customer Facing Architect
>>> Salesforce Platform / Heroku
>>> San Francisco, California
>>>
>>
>>
> --
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Reply via email to