[GitHub] spark pull request #22881: [SPARK-25855][CORE] Don't use erasure coding for ...

squito Mon, 29 Oct 2018 21:24:28 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22881#discussion_r229172448
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
    @@ -471,4 +473,42 @@ object SparkHadoopUtil {
           hadoopConf.set(key.substring("spark.hadoop.".length), value)
         }
       }
    +
    +
    +  lazy val builderReflection: Option[(Class[_], Method, Method)] = Try {
    +    val cls = Utils.classForName(
    +      
"org.apache.hadoop.hdfs.DistributedFileSystem$HdfsDataOutputStreamBuilder")
    +    (cls, cls.getMethod("replicate"), cls.getMethod("build"))
    +  }.toOption
    +
    +  // scalastyle:off line.size.limit
    +  /**
    +   * Create a path that uses replication instead of erasure coding, 
regardless of the default
    +   * configuration in hdfs for the given path.  This can be helpful as 
hdfs ec doesn't support
    +   * hflush(), hsync(), or append()
    +   * 
https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html#Limitations
    +   */
    +  // scalastyle:on line.size.limit
    +  def createNonECFile(fs: FileSystem, path: Path): FSDataOutputStream = {
    +    try {
    +      // Use reflection as this uses apis only avialable in hadoop 3
    +      val builderMethod = fs.getClass().getMethod("createFile", 
classOf[Path])
    +      val builder = builderMethod.invoke(fs, path)
    +      builderReflection match {
    --- End diff --
    
    good point on the reflection, I was trying something else in earlier 
experiments and didn't clean up.
    
    on poking into `DistributedFileSystem` -- @xiao-chen had similar concerns, 
but also said it seemed there wasn't another option and it looked like an 
oversight in the hdfs api.  @steveloughran maybe you have thoughts here as well?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22881: [SPARK-25855][CORE] Don't use erasure coding for ...

Reply via email to