subject:"\[jira\] \[Commented\] \(TOREE\-349\) ClassCastException when reading Avro from another thread \(Toree master \/ Spark 2.0.0\)"

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

2016-11-14 Thread Phil Berkland (JIRA)


[ 
https://issues.apache.org/jira/browse/TOREE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664355#comment-15664355
 ] 

Phil Berkland commented on TOREE-349:
-

Please look at solution to https://issues.apache.org/jira/browse/TOREE-351, 
which is probably the same underlying issue.


> ClassCastException when reading Avro from another thread (Toree master / 
> Spark 2.0.0)
> -
>
> Key: TOREE-349
> URL: https://issues.apache.org/jira/browse/TOREE-349
> Project: TOREE
>  Issue Type: Bug
>Reporter: Andrew Kerr
> Attachments: avro-csv-addDeps.scala.ipynb, 
> avro-csv-threading.scala.ipynb, avro-csv-threading.scala.ipynb, run.sh
>
>
> When using Toree (master branch commit 
> e8ecd0623c65ad104045b1797fb27f69b8dfc23f)
> with `--packages=com.databricks:spark-avro_2.11:3.0.1` in `SPARK_OPTS`
> and attempting to load an avro file into a dataframe *in a separate thread*
> then an exception is thrown
> `java.lang.ClassCastException: 
> com.databricks.spark.avro.DefaultSource$SerializableConfiguration cannot be 
> cast to com.databricks.spark.avro.DefaultSource$SerializableConfiguration`
> here
> https://github.com/databricks/spark-avro/blob/v3.0.1/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L156
> Will attach a Jupyter notebook that illustrates the problem and includes full
> stack trace, with a script showing environment.
> The class that throws the exception `DefaultSource` broadcasts Hadoop config
> and returns an anonymous function that accesses that config. The exception
> occurs when that function is executed and it attempts to access the config.
> This looks like a class loader mismatch problem to me ("Class Identity 
> Crisis").
> With a bit of hacking of `spark-avro` I've seen the class loader for 
> `DefaultSource` when the config is broadcast to be 
> `scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@31ac5411`
> and when the config is read to be
> `org.apache.spark.util.MutableURLClassLoader@3d3fcdb0`
> If a fat jar including `spark-avro` is built and included with `--jars=...`
> then the same problem occurs.
> Interestingly the Spark's included support for CSV uses the same pattern as
> Avro, broadcasting a config, but works as expected as shown in the notebook.
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L108
> Avro also works as expected when an application fat jar is built and passed 
> to 
> `spark-submit` without involving Toree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

2016-10-18 Thread Andrew Kerr (JIRA)


[ 
https://issues.apache.org/jira/browse/TOREE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585210#comment-15585210
 ] 

Andrew Kerr commented on TOREE-349:
---

Apparently not, no. The below works fine in spark-shell.

{code:language=scala}
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
import scala.concurrent.ExecutionContext.Implicits.global
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SaveMode
import com.databricks.spark.avro._

val session = SparkSession.builder().getOrCreate()
import session.implicits._

val dataframe = sc.parallelize(1 to 10).toDF
dataframe.show()

dataframe.write.mode(SaveMode.Overwrite).csv("csv")
dataframe.write.mode(SaveMode.Overwrite).avro("avro")

val future = Future(session.read.csv("csv"))
val result = Await.result(future, Duration.Inf)
result.show()

val future = Future(session.read.avro("avro"))
val result = Await.result(future, Duration.Inf)
result.show()
{code}

> ClassCastException when reading Avro from another thread (Toree master / 
> Spark 2.0.0)
> -
>
> Key: TOREE-349
> URL: https://issues.apache.org/jira/browse/TOREE-349
> Project: TOREE
>  Issue Type: Bug
>Reporter: Andrew Kerr
> Attachments: avro-csv-threading.scala.ipynb, run.sh
>
>
> When using Toree (master branch commit 
> e8ecd0623c65ad104045b1797fb27f69b8dfc23f)
> with `--packages=com.databricks:spark-avro_2.11:3.0.1` in `SPARK_OPTS`
> and attempting to load an avro file into a dataframe *in a separate thread*
> then an exception is thrown
> `java.lang.ClassCastException: 
> com.databricks.spark.avro.DefaultSource$SerializableConfiguration cannot be 
> cast to com.databricks.spark.avro.DefaultSource$SerializableConfiguration`
> here
> https://github.com/databricks/spark-avro/blob/v3.0.1/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L156
> Will attach a Jupyter notebook that illustrates the problem and includes full
> stack trace, with a script showing environment.
> The class that throws the exception `DefaultSource` broadcasts Hadoop config
> and returns an anonymous function that accesses that config. The exception
> occurs when that function is executed and it attempts to access the config.
> This looks like a class loader mismatch problem to me ("Class Identity 
> Crisis").
> With a bit of hacking of `spark-avro` I've seen the class loader for 
> `DefaultSource` when the config is broadcast to be 
> `scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@31ac5411`
> and when the config is read to be
> `org.apache.spark.util.MutableURLClassLoader@3d3fcdb0`
> If a fat jar including `spark-avro` is built and included with `--jars=...`
> then the same problem occurs.
> Interestingly the Spark's included support for CSV uses the same pattern as
> Avro, broadcasting a config, but works as expected as shown in the notebook.
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L108
> Avro also works as expected when an application fat jar is built and passed 
> to 
> `spark-submit` without involving Toree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

2016-10-17 Thread Marius Van Niekerk (JIRA)


[ 
https://issues.apache.org/jira/browse/TOREE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583801#comment-15583801
 ] 

Marius Van Niekerk commented on TOREE-349:
--

So do you need to do the same thing for spark-shell?

> ClassCastException when reading Avro from another thread (Toree master / 
> Spark 2.0.0)
> -
>
> Key: TOREE-349
> URL: https://issues.apache.org/jira/browse/TOREE-349
> Project: TOREE
>  Issue Type: Bug
>Reporter: Andrew Kerr
> Attachments: avro-csv-threading.scala.ipynb, run.sh
>
>
> When using Toree (master branch commit 
> e8ecd0623c65ad104045b1797fb27f69b8dfc23f)
> with `--packages=com.databricks:spark-avro_2.11:3.0.1` in `SPARK_OPTS`
> and attempting to load an avro file into a dataframe *in a separate thread*
> then an exception is thrown
> `java.lang.ClassCastException: 
> com.databricks.spark.avro.DefaultSource$SerializableConfiguration cannot be 
> cast to com.databricks.spark.avro.DefaultSource$SerializableConfiguration`
> here
> https://github.com/databricks/spark-avro/blob/v3.0.1/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L156
> Will attach a Jupyter notebook that illustrates the problem and includes full
> stack trace, with a script showing environment.
> The class that throws the exception `DefaultSource` broadcasts Hadoop config
> and returns an anonymous function that accesses that config. The exception
> occurs when that function is executed and it attempts to access the config.
> This looks like a class loader mismatch problem to me ("Class Identity 
> Crisis").
> With a bit of hacking of `spark-avro` I've seen the class loader for 
> `DefaultSource` when the config is broadcast to be 
> `scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@31ac5411`
> and when the config is read to be
> `org.apache.spark.util.MutableURLClassLoader@3d3fcdb0`
> If a fat jar including `spark-avro` is built and included with `--jars=...`
> then the same problem occurs.
> Interestingly the Spark's included support for CSV uses the same pattern as
> Avro, broadcasting a config, but works as expected as shown in the notebook.
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L108
> Avro also works as expected when an application fat jar is built and passed 
> to 
> `spark-submit` without involving Toree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

2016-10-17 Thread Andrew Kerr (JIRA)


[ 
https://issues.apache.org/jira/browse/TOREE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582769#comment-15582769
 ] 

Andrew Kerr commented on TOREE-349:
---

This code works as expected:

```
val classLoader = Thread.currentThread().getContextClassLoader
println(classLoader)
val future = Future{
Thread.currentThread().setContextClassLoader(classLoader)
session.read.avro("foo")
}
val result = Await.result(future, Duration.Inf)
result.show()
```

The classloader is 
`scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@864ff30`

Obviously this isn't ideal. It also isn't necessary for loading CSV files, 
which are implemented in a similar way to the Avro loader (as in the Avro code 
looks copy-pasted from CSV).

> ClassCastException when reading Avro from another thread (Toree master / 
> Spark 2.0.0)
> -
>
> Key: TOREE-349
> URL: https://issues.apache.org/jira/browse/TOREE-349
> Project: TOREE
>  Issue Type: Bug
>Reporter: Andrew Kerr
> Attachments: avro-csv-threading.scala.ipynb, run.sh
>
>
> When using Toree (master branch commit 
> e8ecd0623c65ad104045b1797fb27f69b8dfc23f)
> with `--packages=com.databricks:spark-avro_2.11:3.0.1` in `SPARK_OPTS`
> and attempting to load an avro file into a dataframe *in a separate thread*
> then an exception is thrown
> `java.lang.ClassCastException: 
> com.databricks.spark.avro.DefaultSource$SerializableConfiguration cannot be 
> cast to com.databricks.spark.avro.DefaultSource$SerializableConfiguration`
> here
> https://github.com/databricks/spark-avro/blob/v3.0.1/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L156
> Will attach a Jupyter notebook that illustrates the problem and includes full
> stack trace, with a script showing environment.
> The class that throws the exception `DefaultSource` broadcasts Hadoop config
> and returns an anonymous function that accesses that config. The exception
> occurs when that function is executed and it attempts to access the config.
> This looks like a class loader mismatch problem to me ("Class Identity 
> Crisis").
> With a bit of hacking of `spark-avro` I've seen the class loader for 
> `DefaultSource` when the config is broadcast to be 
> `scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@31ac5411`
> and when the config is read to be
> `org.apache.spark.util.MutableURLClassLoader@3d3fcdb0`
> If a fat jar including `spark-avro` is built and included with `--jars=...`
> then the same problem occurs.
> Interestingly the Spark's included support for CSV uses the same pattern as
> Avro, broadcasting a config, but works as expected as shown in the notebook.
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L108
> Avro also works as expected when an application fat jar is built and passed 
> to 
> `spark-submit` without involving Toree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

2016-10-14 Thread Marius Van Niekerk (JIRA)


[ 
https://issues.apache.org/jira/browse/TOREE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575398#comment-15575398
 ] 

Marius Van Niekerk commented on TOREE-349:
--

So for your separate thread do you set the contextClassLoader explicitly ?

You might have to do that to ensure that you use the proper classloader. (the 
org.apache.spark one)

> ClassCastException when reading Avro from another thread (Toree master / 
> Spark 2.0.0)
> -
>
> Key: TOREE-349
> URL: https://issues.apache.org/jira/browse/TOREE-349
> Project: TOREE
>  Issue Type: Bug
>Reporter: Andrew Kerr
> Attachments: avro-csv-threading.scala.ipynb, run.sh
>
>
> When using Toree (master branch commit 
> e8ecd0623c65ad104045b1797fb27f69b8dfc23f)
> with `--packages=com.databricks:spark-avro_2.11:3.0.1` in `SPARK_OPTS`
> and attempting to load an avro file into a dataframe *in a separate thread*
> then an exception is thrown
> `java.lang.ClassCastException: 
> com.databricks.spark.avro.DefaultSource$SerializableConfiguration cannot be 
> cast to com.databricks.spark.avro.DefaultSource$SerializableConfiguration`
> here
> https://github.com/databricks/spark-avro/blob/v3.0.1/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L156
> Will attach a Jupyter notebook that illustrates the problem and includes full
> stack trace, with a script showing environment.
> The class that throws the exception `DefaultSource` broadcasts Hadoop config
> and returns an anonymous function that accesses that config. The exception
> occurs when that function is executed and it attempts to access the config.
> This looks like a class loader mismatch problem to me ("Class Identity 
> Crisis").
> With a bit of hacking of `spark-avro` I've seen the class loader for 
> `DefaultSource` when the config is broadcast to be 
> `scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@31ac5411`
> and when the config is read to be
> `org.apache.spark.util.MutableURLClassLoader@3d3fcdb0`
> If a fat jar including `spark-avro` is built and included with `--jars=...`
> then the same problem occurs.
> Interestingly the Spark's included support for CSV uses the same pattern as
> Avro, broadcasting a config, but works as expected as shown in the notebook.
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L108
> Avro also works as expected when an application fat jar is built and passed 
> to 
> `spark-submit` without involving Toree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

[jira] [Commented] (TOREE-349) ClassCastException when reading Avro from another thread (Toree master / Spark 2.0.0)

5 matches

Site Navigation

Mail list logo

Footer information