[jira] [Commented] (SPARK-20590) Map default input data source formats to inlined classes
[ https://issues.apache.org/jira/browse/SPARK-20590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303237#comment-17303237 ] Yu Xiang commented on SPARK-20590: -- [~cloud_fan], I tried to use the full name as, it does not work. Any idea? (more detailed explanation of the problem is here: https://stackoverflow.com/questions/4181/spark-multiple-sources-found-for-text) {code:java} DataFrameReader read = spark.read(); JavaRDD stringJavaRDD = read.format("org.apache.spark.sql.execution.datasources.text.TextFileFormat").textFile(inputPath).javaRDD(); {code} > Map default input data source formats to inlined classes > > > Key: SPARK-20590 > URL: https://issues.apache.org/jira/browse/SPARK-20590 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sameer Agarwal >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 2.2.0 > > > One of the common usability problems around reading data in spark > (particularly CSV) is that there can often be a conflict between different > readers in the classpath. > As an example, if someone launches a 2.x spark shell with the spark-csv > package in the classpath, Spark currently fails in an extremely unfriendly way > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > java.lang.RuntimeException: Multiple sources found for csv > (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, > com.databricks.spark.csv.DefaultSource15), please specify the fully qualified > class name. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > {code} > This JIRA proposes a simple way of fixing this error by always mapping > default input data source formats to inlined classes (that exist in Spark). > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > df: org.apache.spark.sql.DataFrame = [_c0: string] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20590) Map default input data source formats to inlined classes
[ https://issues.apache.org/jira/browse/SPARK-20590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005869#comment-16005869 ] Wenchen Fan commented on SPARK-20590: - We only prefer internal data source if the given name is a short name like "csv", "json", etc. Using full name still works. > Map default input data source formats to inlined classes > > > Key: SPARK-20590 > URL: https://issues.apache.org/jira/browse/SPARK-20590 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sameer Agarwal >Assignee: Hyukjin Kwon > Fix For: 2.2.1, 2.3.0 > > > One of the common usability problems around reading data in spark > (particularly CSV) is that there can often be a conflict between different > readers in the classpath. > As an example, if someone launches a 2.x spark shell with the spark-csv > package in the classpath, Spark currently fails in an extremely unfriendly way > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > java.lang.RuntimeException: Multiple sources found for csv > (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, > com.databricks.spark.csv.DefaultSource15), please specify the fully qualified > class name. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > {code} > This JIRA proposes a simple way of fixing this error by always mapping > default input data source formats to inlined classes (that exist in Spark). > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > df: org.apache.spark.sql.DataFrame = [_c0: string] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20590) Map default input data source formats to inlined classes
[ https://issues.apache.org/jira/browse/SPARK-20590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005864#comment-16005864 ] Felix Cheung commented on SPARK-20590: -- When the user explicitly specifies the package to use, shouldn't that take priority over the internal one? say if there is a better csv implementation exists as a spark package, then right now there is no way to use it. > Map default input data source formats to inlined classes > > > Key: SPARK-20590 > URL: https://issues.apache.org/jira/browse/SPARK-20590 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sameer Agarwal >Assignee: Hyukjin Kwon > Fix For: 2.2.1, 2.3.0 > > > One of the common usability problems around reading data in spark > (particularly CSV) is that there can often be a conflict between different > readers in the classpath. > As an example, if someone launches a 2.x spark shell with the spark-csv > package in the classpath, Spark currently fails in an extremely unfriendly way > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > java.lang.RuntimeException: Multiple sources found for csv > (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, > com.databricks.spark.csv.DefaultSource15), please specify the fully qualified > class name. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > {code} > This JIRA proposes a simple way of fixing this error by always mapping > default input data source formats to inlined classes (that exist in Spark). > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > df: org.apache.spark.sql.DataFrame = [_c0: string] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20590) Map default input data source formats to inlined classes
[ https://issues.apache.org/jira/browse/SPARK-20590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002225#comment-16002225 ] Apache Spark commented on SPARK-20590: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/17916 > Map default input data source formats to inlined classes > > > Key: SPARK-20590 > URL: https://issues.apache.org/jira/browse/SPARK-20590 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sameer Agarwal > > One of the common usability problems around reading data in spark > (particularly CSV) is that there can often be a conflict between different > readers in the classpath. > As an example, if someone launches a 2.x spark shell with the spark-csv > package in the classpath, Spark currently fails in an extremely unfriendly way > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > java.lang.RuntimeException: Multiple sources found for csv > (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, > com.databricks.spark.csv.DefaultSource15), please specify the fully qualified > class name. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > {code} > This JIRA proposes a simple way of fixing this error by always mapping > default input data source formats to inlined classes (that exist in Spark). > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > df: org.apache.spark.sql.DataFrame = [_c0: string] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20590) Map default input data source formats to inlined classes
[ https://issues.apache.org/jira/browse/SPARK-20590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995799#comment-15995799 ] Apache Spark commented on SPARK-20590: -- User 'sameeragarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/17847 > Map default input data source formats to inlined classes > > > Key: SPARK-20590 > URL: https://issues.apache.org/jira/browse/SPARK-20590 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sameer Agarwal > > One of the common usability problems around reading data in spark > (particularly CSV) is that there can often be a conflict between different > readers in the classpath. > As an example, if someone launches a 2.x spark shell with the spark-csv > package in the classpath, Spark currently fails in an extremely unfriendly way > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > java.lang.RuntimeException: Multiple sources found for csv > (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, > com.databricks.spark.csv.DefaultSource15), please specify the fully qualified > class name. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > {code} > This JIRA proposes a simple way of fixing this error by always mapping > default input data source formats to inlined classes (that exist in Spark). > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > df: org.apache.spark.sql.DataFrame = [_c0: string] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org