Vinh Tran created ZEPPELIN-4971:
-----------------------------------

             Summary: XGBOOST4j Spark Fails String Indexer
                 Key: ZEPPELIN-4971
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4971
             Project: Zeppelin
          Issue Type: Bug
          Components: conf, Interpreters, spark, zeppelin-server
            Reporter: Vinh Tran


I'm trying to follow the tutorial for running XGBOOST[ 
XGBOOST-SPARK|https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html]
 on a Spark 3.0.0 cluster in Apache Zeppelin 0.8.2.

However, when I load the dependencies: 
{code:java}
export SPARK_SUBMIT_OPTIONS="--package ml.dmlc:xgboo4j-spark_2.12:1.00"
{code}
I get the following error when I run the following StringIndexer.
{code:java}
val stringIndexer = new StringIndexer().
  setInputCol("class").
  setOutputCol("classIndex").
  fit(rawInput)
{code}
{code:java}
 
java.lang.NoSuchMethodError: 
com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V
 at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:99) at 
com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:62)
 at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:131) at 
org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
 at 
com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
 at 
org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
 at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:336)
 at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:389)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
 Source) at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
 at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:175)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) 
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at 
scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at 
org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625) at 
org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2938) at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616) at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614) at 
org.apache.spark.sql.Dataset.collect(Dataset.scala:2938) at 
org.apache.spark.ml.feature.StringIndexer.countByValue(StringIndexer.scala:204) 
at 
org.apache.spark.ml.feature.StringIndexer.sortByFreq(StringIndexer.scala:212) 
at org.apache.spark.ml.feature.StringIndexer.fit(StringIndexer.scala:241) ... 
46 elided
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to