[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

tmyklebu Thu, 29 Jan 2015 06:58:17 -0800

Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/4261#issuecomment-72037926
  
    I don't think these test failures are my fault, unless I need to handle 
SparkContext lifetimes differently .  One thing that I see in the test failure 
log is this:
    
        [info] Test org.apache.spark.sql.api.java.JavaAPISuite.udf1Test started
        20:59:18.609 WARN org.apache.spark.SparkContext: Multiple running 
SparkContexts detected in the same JVM!
        org.apache.spark.SparkException: Only one SparkContext may be running 
in this JVM (see SPARK-2243). To ignore this error, set 
spark.driver.allowMultipleContexts = true. The currently running SparkContext 
was created at:
        org.apache.spark.SparkContext.<init>(SparkContext.scala:124)
        
org.apache.spark.sql.test.TestSQLContext$.<init>(TestSQLContext.scala:29)
        org.apache.spark.sql.test.TestSQLContext$.<clinit>(TestSQLContext.scala)
        [...]
            at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
            at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:67)
            at 
org.apache.spark.sql.api.java.JavaAPISuite.setUp(JavaAPISuite.java:40)
    
    `JavaAPISuite` spins up a new SparkContext:
    
          @Before
          public void setUp() {
            sc = new JavaSparkContext("local", "JavaAPISuite");
            sqlContext = new SQLContext(sc);
          }
    
    and destroys it when it's done:
    
          @After
          public void tearDown() {
            sc.stop();
            sc = null;
          }
    
    Should it?  There's already a SQLContext out there; it's called 
`TestSQLContext$.MODULE$`.
    
    I can't reproduce the test failures on my side.  The test failures (all 
three in JavaJDBCSuite) look like this:
    
        [info] Test org.apache.spark.sql.jdbc.JavaJDBCTest.basicTest started
        [error] Test org.apache.spark.sql.jdbc.JavaJDBCTest.basicTest failed: 
Task not serializable
        [error]     at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        [error]     at 
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        [error]     at 
org.apache.spark.SparkContext.clean(SparkContext.scala:1488)
        [error]     at org.apache.spark.rdd.RDD.map(RDD.scala:290)
        [error]     at org.apache.spark.sql.DataFrame.rdd(DataFrame.scala:527)
        [error]     at 
org.apache.spark.sql.DataFrame.collect(DataFrame.scala:484)
        [error]     at 
org.apache.spark.sql.jdbc.JavaJDBCTest.basicTest(JavaJDBCTest.java:62)
        [error]     ...
        [error] Caused by: java.lang.NullPointerException
        [error]     at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
        [error]     ... 44 more
    
    Line 164 is
    
        SparkEnv.get.closureSerializer.newInstance().serialize(func)
    
    I think `SparkEnv.get` is returning null.  When you spin up a SparkContext, 
it creates a `SparkEnv` and does `SparkEnv.set(env)`.  When you `stop()` it, it 
does `SparkEnv.set(null)`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

Reply via email to