[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

felixcheung Tue, 20 Dec 2016 16:03:10 -0800

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16330#discussion_r93351538
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
    @@ -104,6 +104,12 @@ class SparkHadoopUtil extends Logging {
           }
           val bufferSize = conf.get("spark.buffer.size", "65536")
           hadoopConf.set("io.file.buffer.size", bufferSize)
    +
    +      if (conf.contains("spark.sql.default.derby.dir")) {
    --- End diff --
    
    @yhuai 
    Spark by default has derby for metastore. Generally metastore_db and 
derby.log gets created by default in the current directory. This creates a 
problem for more restrictive environment, such as when running as a R package 
when the guideline is not to have anything written to user's space (unless 
under tempdir)
    
    Just checking now it also seems to be the case when running the pyspark 
shell.
    
    It looks like this is the new behavior since 2.0.0. Would it make sense if 
we always default derby/metastore to tempdir unless it is running in an 
application directory that would be cleaned out when the job is done (eg. YARN 
cluster)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

Reply via email to