Terry Kim created SPARK-32621:
---------------------------------

             Summary: "path" option is added again to input paths during infer()
                 Key: SPARK-32621
                 URL: https://issues.apache.org/jira/browse/SPARK-32621
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0, 2.4.6, 3.0.1, 3.1.0
            Reporter: Terry Kim


When "path" option is used when creating a DataFrame, it can cause issues 
during infer.
{code:java}
class TestFileFilter extends PathFilter {
  override def accept(path: Path): Boolean = path.getParent.getName != "p=2"
}

val path = "/tmp"
val df = spark.range(2)
df.write.json(path + "/p=1")
df.write.json(path + "/p=2")

val extraOptions = Map(
  "mapred.input.pathFilter.class" -> classOf[TestFileFilter].getName,
  "mapreduce.input.pathFilter.class" -> classOf[TestFileFilter].getName
)

// This works fine.
assert(spark.read.options(extraOptions).json(path).count == 2)

// The following with "path" option fails with the following:
// assertion failed: Conflicting directory structures detected. Suspicious paths
//      file:/tmp
//      file:/tmp/p=1
assert(spark.read.options(extraOptions).format("json").option("path", 
path).load.count() === 2)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to