[ https://issues.apache.org/jira/browse/SPARK-42193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-42193. ---------------------------------- Resolution: Cannot Reproduce > dataframe API filter criteria throwing ParseException when reading a JDBC > column name with special characters > ------------------------------------------------------------------------------------------------------------- > > Key: SPARK-42193 > URL: https://issues.apache.org/jira/browse/SPARK-42193 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Shanmugavel Kuttiyandi Chandrakasu > Priority: Minor > > *On Spark 3.3.0,* when reading from a JDBC table(used SQLite to repro) using > spark.read.jdbc command with sqlite-jdbc:3.34.0.jar on a table and column > name containing special characters. Dataframe API filter criteria fails with > parse Exception > *[#Script:]* > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession \ > .builder \ > .appName("Databricks Support") \ > .config("spark.jars.packages", "org.xerial:sqlite-jdbc:3.34.0") \ > .getOrCreate() > columns = ["id", "/abc/column", "value"] > data = [(1, 'A', 100), (2, 'B', 200), (3, 'B', 300)] > rdd = spark.sparkContext.parallelize(data) > df = spark.createDataFrame(rdd).toDF(*columns) > options = {"url": > "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db", "dbtable": > '"/abc/table"', "driver": "org.sqlite.JDBC"} > df.coalesce(1).write.format("jdbc").options(**options).mode("append").save() > df_1 = spark.read.format("jdbc") \ > .option("url", > "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db") \ > .option("dbtable", '"/abc/table"') \ > .option("driver", "org.sqlite.JDBC") \ > .load() > df_2 = df_1.filter("`/abc/column` = 'B'") > df_2.show() {code} > Error: > {code:java} > ``` Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/dataframe.py", > line 606, in show > print(self._jdf.showString(n, 20, vertical)) > File > "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", > line 1321, in __call__ > File > "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/utils.py", > line 196, in deco > raise converted from None > pyspark.sql.utils.ParseException: > Syntax error at or near '/': extra input '/'(line 1, pos 0) > == SQL == > /abc/column > ^^^``` {code} > However, when using Spark 3.2.1, we are able to successfully apply > dataframe.filter option > {code:java} > >>> df_2.show() > +---+-----------+-----+ > | id|/abc/column|value| > +---+-----------+-----+ > | 2| B| 200| > | 3| B| 300| > +---+-----------+-----+ {code} > *Repro steps:* > # Download [Spark 3.2.1 in local > |https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz] > # Download and Copy the sqlite-jdbc:3.34.0.jar into the jar folder present > in the local spark download folder > # Run the above [#script] by providing the jar path > # This will create a */abc/table* with column */abc/column* and returns > result when applying filter criteria > # Download spark ** [3.3.0 in > local|https://www.apache.org/dyn/closer.lua/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz] > # Repeat #2, #3 > # Fails with parse exception. > could you please let us know how we can filter on the special characters > column or escape them on spark version 3.3.0? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org