[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090 ] Paul Wu edited comment on SPARK-20427 at 10/1/19 4:13 PM: -- Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed that one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} Also, some may think their issues come from the .write() operation, but the issues are in fact from the .read() operation. The col1, col2...column names are not necessarily from the original tables. They could be the calculated fields for output in the queries. One could mistakenly bark at a wrong place to try to fix the issues. was (Author: zwu@gmail.com): Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed that one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090 ] Paul Wu edited comment on SPARK-20427 at 10/1/19 3:49 PM: -- Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed that one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} was (Author: zwu@gmail.com): Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090 ] Paul Wu commented on SPARK-20427: - Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27077) DataFrameReader and Number of Connection Limitation
[ https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-27077: Description: I am not very sure this is a Spark core issue or a Vertica issue, however I intended to think this is Spark's issue. The problem is that when we try to read with sparkSession.read.load from some datasource, in my case, Vertica DB, the DataFrameReader needs to make some 'large' number of initial jdbc connection requests. My account limits I can only use 16 (and I can see at least 6 of them can be used for my loading), and when the "large" number of the requests issued, I got exception below. In fact, I can see eventually it could settle with fewer numbers of connections (in my case 2 simultaneous DataFrameReader). So I think we should have a parameter that prevents the reader from sending out initial "bigger" number of connection requests than user's limit. If we don't have this option parameter, my app could fail randomly due to my Vertica account's number of connections allowed. java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on database already met for M21176 at com.vertica.util.ServerErrorData.buildException(Unknown Source) at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) at com.vertica.io.ProtocolStream.initSession(Unknown Source) at com.vertica.core.VConnection.tryConnect(Unknown Source) at com.vertica.core.VConnection.connect(Unknown Source) at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) at com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) at com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156) Caused by: com.vertica.support.exceptions.NonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on databas e already met for was: I am not very sure this is a Spark core issue or a Vertica issue, however I intended to think this is Spark's issue. The problem is that when we try to read with sparkSession.read.load from some datasource, in my case, Vertica DB, the DataFrameReader needs to make some 'large' number of initial jdbc connection requests. My account limits I can only use 16 (and I can see at least 6 of them can be used for my loading), and when the "large" number of the requests issued, I got exception below. In fact, I can see eventually it could settle with fewer numbers of connections (in my case 2 simultaneous DataFrameReader). So I think we should have a parameter that prevents the reader to send out initial "bigger" number of connection requests than user's limit. If we don't have this option parameter, my app could fail randomly due to my Vertica account's number of connections allowed. java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on database already met for M21176 at com.vertica.util.ServerErrorData.buildException(Unknown Source) at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) at com.vertica.io.ProtocolStream.initSession(Unknown Source) at com.vertica.core.VConnection.tryConnect(Unknown Source) at com.vertica.core.VConnection.connect(Unknown Source) at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) at com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) at
[jira] [Created] (SPARK-27077) DataFrameReader and Number of Connection Limitation
Paul Wu created SPARK-27077: --- Summary: DataFrameReader and Number of Connection Limitation Key: SPARK-27077 URL: https://issues.apache.org/jira/browse/SPARK-27077 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.3.2 Reporter: Paul Wu I am not very sure this is a Spark core issue or a Vertica issue, however I intended to think this is Spark's issue. The problem is that when we try to read with sparkSession.read.load from some datasource, in my case, Vertica DB, the DataFrameReader needs to make some 'large' number of initial jdbc connection requests. My account limits I can only use 16 (and I can see at least 6 of them can be used for my loading), and when the "large" number of the requests issued, I got exception below. In fact, I can see eventually it could settle with fewer numbers of connections (in my case 2 simultaneous DataFrameReader). So I think we should have a parameter that prevents the reader to send out initial "bigger" number of connection requests than user's limit. If we don't have this option parameter, my app could fail randomly due to my Vertica account's number of connections allowed. java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on database already met for M21176 at com.vertica.util.ServerErrorData.buildException(Unknown Source) at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) at com.vertica.io.ProtocolStream.initSession(Unknown Source) at com.vertica.core.VConnection.tryConnect(Unknown Source) at com.vertica.core.VConnection.connect(Unknown Source) at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) at com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) at com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156) Caused by: com.vertica.support.exceptions.NonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on databas e already met for -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22371) dag-scheduler-event-loop thread stopped with error Attempted to access garbage collected accumulator 5605982
[ https://issues.apache.org/jira/browse/SPARK-22371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466099#comment-16466099 ] Paul Wu commented on SPARK-22371: - Got the same problem with 2.3 and also the program stalled: {{ Uncaught exception in thread heartbeat-receiver-event-loop-thread}} {{java.lang.IllegalStateException: Attempted to access garbage collected accumulator 8825}} {{ at org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:265)}} {{ at org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:261)}} {{ at scala.Option.map(Option.scala:146)}} {{ at org.apache.spark.util.AccumulatorContext$.get(AccumulatorV2.scala:261)}} {{ at org.apache.spark.util.AccumulatorV2$$anonfun$name$1.apply(AccumulatorV2.scala:87)}} {{ at org.apache.spark.util.AccumulatorV2$$anonfun$name$1.apply(AccumulatorV2.scala:87)}} {{ at scala.Option.orElse(Option.scala:289)}} {{ at org.apache.spark.util.AccumulatorV2.name(AccumulatorV2.scala:87)}} {{ at org.apache.spark.util.AccumulatorV2.toInfo(AccumulatorV2.scala:108)}} > dag-scheduler-event-loop thread stopped with error Attempted to access > garbage collected accumulator 5605982 > - > > Key: SPARK-22371 > URL: https://issues.apache.org/jira/browse/SPARK-22371 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Mayank Agarwal >Priority: Major > Attachments: Helper.scala, ShuffleIssue.java, > driver-thread-dump-spark2.1.txt, sampledata > > > Our Spark Jobs are getting stuck on DagScheduler.runJob as dagscheduler > thread is stopped because of *Attempted to access garbage collected > accumulator 5605982*. > from our investigation it look like accumulator is cleaned by GC first and > same accumulator is used for merging the results from executor on task > completion event. > As the error java.lang.IllegalAccessError is LinkageError which is treated as > FatalError so dag-scheduler loop is finished with below exception. > ---ERROR stack trace -- > Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: > Attempted to access garbage collected accumulator 5605982 > at > org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:253) > at > org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:249) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.util.AccumulatorContext$.get(AccumulatorV2.scala:249) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1083) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1080) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1080) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1183) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1647) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > I am attaching the thread dump of driver as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23617) Register a Function without params with Spark SQL Java API
[ https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu resolved SPARK-23617. - Resolution: Duplicate Fix Version/s: 2.3.0 As commented by Hyukjin Kwon, the issue is duplicated and has been fixed in 2.3.0. > Register a Function without params with Spark SQL Java API > -- > > Key: SPARK-23617 > URL: https://issues.apache.org/jira/browse/SPARK-23617 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.2.1 >Reporter: Paul Wu >Priority: Major > Fix For: 2.3.0 > > > One can register a function using Scala: > {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString)}} > Now, if I use Java API: > {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString());}} > The code does not compile. Define UDF0 for Java API? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23617) Register a Function without params with Spark SQL Java API
[ https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-23617: Description: One can register a function using Scala: {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString)}} Now, if I use Java API: {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString());}} The code does not compile. Define UDF0 for Java API? was: One can register a function using Scala: spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) Now, if I use Java API: spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); The code does not compile. Define UDF0 for Java API? > Register a Function without params with Spark SQL Java API > -- > > Key: SPARK-23617 > URL: https://issues.apache.org/jira/browse/SPARK-23617 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.2.1 >Reporter: Paul Wu >Priority: Major > > One can register a function using Scala: > {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString)}} > Now, if I use Java API: > {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString());}} > The code does not compile. Define UDF0 for Java API? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23617) Register a Function without params with Spark SQL Java API
[ https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-23617: Description: One can register a function using Scala: spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) Now, if I use Java API: spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); The code does not compile. Define UDF0 for Java API? was: One can register a function using Scala: {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }} Now, if I use Java API: {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }} The code does not compile. Define UDF0 for Java API? > Register a Function without params with Spark SQL Java API > -- > > Key: SPARK-23617 > URL: https://issues.apache.org/jira/browse/SPARK-23617 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.2.1 >Reporter: Paul Wu >Priority: Major > > One can register a function using Scala: > spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) > Now, if I use Java API: > spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); > The code does not compile. Define UDF0 for Java API? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23617) Register a Function without params with Spark SQL Java API
[ https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-23617: Description: One can register a function using Scala: {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }} Now, if I use Java API: {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }} The code does not compile. Define UDF0 for Java API? was: One can register a function using Scala: {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }} Now, if I use Java API: {{ spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }} The code does not compile. Define UDF0 for Java API? > Register a Function without params with Spark SQL Java API > -- > > Key: SPARK-23617 > URL: https://issues.apache.org/jira/browse/SPARK-23617 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.2.1 >Reporter: Paul Wu >Priority: Major > > One can register a function using Scala: > {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }} > Now, if I use Java API: > {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }} > The code does not compile. Define UDF0 for Java API? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23617) Register a Function without params with Spark SQL Java API
Paul Wu created SPARK-23617: --- Summary: Register a Function without params with Spark SQL Java API Key: SPARK-23617 URL: https://issues.apache.org/jira/browse/SPARK-23617 Project: Spark Issue Type: Improvement Components: Java API, SQL Affects Versions: 2.2.1 Reporter: Paul Wu One can register a function using Scala: {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }} Now, if I use Java API: {{ spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }} The code does not compile. Define UDF0 for Java API? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23193) Insert into Spark Table statement cannot specify column names
[ https://issues.apache.org/jira/browse/SPARK-23193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-23193: Summary: Insert into Spark Table statement cannot specify column names (was: Insert into Spark Table cannot specify column names) > Insert into Spark Table statement cannot specify column names > - > > Key: SPARK-23193 > URL: https://issues.apache.org/jira/browse/SPARK-23193 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Paul Wu >Priority: Major > > The following code shows the insert statement cannot specify the column > names. The error is > {{insert into aaa (age, name) values (12, 'nn')}} > {{-^^^}} > {{ at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)}} > {{ at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)}} > {{ at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)}} > {{ at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)}} > {{ at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)}} > {{ at com.---.iqi.spk.TestInsert.main(TestInsert.java:75)}} > > {{public class TestInsert {}} > {{ public static final SparkSession spark}} > {{ = SparkSession.builder()}} > {{ .config("spark.sql.warehouse.dir", "file:///temp")}} > {{ .config("spark.driver.memory", "5g")}} > {{ .enableHiveSupport()}} > {{ > .master("local[*]").appName("Spark2JdbcDs").getOrCreate();}} > {{ }} > {{ public static class Person implements Serializable {}} > {{ private String name;}} > {{ private int age;}} > {{ public String getName() {}} > {{ return name;}} > {{ }}} > {{ public void setName(String name) {}} > {{ this.name = name;}} > {{ }}} > {{ public int getAge() {}} > {{ return age;}} > {{ }}} > {{ public void setAge(int age) {}} > {{ this.age = age;}} > {{ }}} > {{ }}} > {{ public static void main(String[] args) throws Exception {}} > {{ // Create an instance of a Bean class}} > {{ Person person = new Person();}} > {{ person.setName("Andy");}} > {{ person.setAge(32);}} > {{// Encoders are created for Java beans}} > {{ Encoder personEncoder = Encoders.bean(Person.class);}} > {{ Dataset javaBeanDS = spark.createDataset(}} > {{ Collections.singletonList(person),}} > {{ personEncoder}} > {{ );}} > {{ javaBeanDS.show();}} > {{ javaBeanDS.createTempView("Abc");}} > {{ spark.sql("drop table aaa");}} > {{ spark.sql("create table aaa as select * from abc");}} > {{ spark.sql("insert into aaa values (12, 'nn')");}} > {{ spark.sql("select * from aaa").show();}} > {{ spark.sql("insert into aaa (age, name) values (12, 'nn')");}} > {{ spark.sql("select * from aaa").show();}} > {{ }} > {{ spark.close();}} > {{ }} > {{ }}} > {{}}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23193) Insert into Spark Table cannot specify column names
Paul Wu created SPARK-23193: --- Summary: Insert into Spark Table cannot specify column names Key: SPARK-23193 URL: https://issues.apache.org/jira/browse/SPARK-23193 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1 Reporter: Paul Wu The following code shows the insert statement cannot specify the column names. The error is {{insert into aaa (age, name) values (12, 'nn')}} {{-^^^}} {{ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)}} {{ at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)}} {{ at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)}} {{ at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)}} {{ at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)}} {{ at com.---.iqi.spk.TestInsert.main(TestInsert.java:75)}} {{public class TestInsert {}} {{ public static final SparkSession spark}} {{ = SparkSession.builder()}} {{ .config("spark.sql.warehouse.dir", "file:///temp")}} {{ .config("spark.driver.memory", "5g")}} {{ .enableHiveSupport()}} {{ .master("local[*]").appName("Spark2JdbcDs").getOrCreate();}} {{ }} {{ public static class Person implements Serializable {}} {{ private String name;}} {{ private int age;}} {{ public String getName() {}} {{ return name;}} {{ }}} {{ public void setName(String name) {}} {{ this.name = name;}} {{ }}} {{ public int getAge() {}} {{ return age;}} {{ }}} {{ public void setAge(int age) {}} {{ this.age = age;}} {{ }}} {{ }}} {{ public static void main(String[] args) throws Exception {}} {{ // Create an instance of a Bean class}} {{ Person person = new Person();}} {{ person.setName("Andy");}} {{ person.setAge(32);}} {{// Encoders are created for Java beans}} {{ Encoder personEncoder = Encoders.bean(Person.class);}} {{ Dataset javaBeanDS = spark.createDataset(}} {{ Collections.singletonList(person),}} {{ personEncoder}} {{ );}} {{ javaBeanDS.show();}} {{ javaBeanDS.createTempView("Abc");}} {{ spark.sql("drop table aaa");}} {{ spark.sql("create table aaa as select * from abc");}} {{ spark.sql("insert into aaa values (12, 'nn')");}} {{ spark.sql("select * from aaa").show();}} {{ spark.sql("insert into aaa (age, name) values (12, 'nn')");}} {{ spark.sql("select * from aaa").show();}} {{ }} {{ spark.close();}} {{ }} {{ }}} {{}}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21740) DataFrame.write does not work with Phoenix JDBC Driver
Paul Wu created SPARK-21740: --- Summary: DataFrame.write does not work with Phoenix JDBC Driver Key: SPARK-21740 URL: https://issues.apache.org/jira/browse/SPARK-21740 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0, 2.0.0 Reporter: Paul Wu The reason for this is that Phoenix JDBC driver does not support "INSERT", but "UPSERT". Exception for the following program: 17/08/15 12:18:53 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) org.apache.phoenix.exception.PhoenixParserException: ERROR 601 (42P00): Syntax error. Encountered "INSERT" at line 1, column 1. at org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33) {code:java} public class HbaseJDBCSpark { private static final SparkSession sparkSession = SparkSession.builder() .config("spark.sql.warehouse.dir", "file:///temp") .config("spark.driver.memory", "5g") .master("local[*]").appName("Spark2JdbcDs").getOrCreate(); static final String JDBC_URL = "jdbc:phoenix:somehost:2181:/hbase-unsecure"; public static void main(String[] args) { final Properties connectionProperties = new Properties(); Dataset jdbcDF = sparkSession.read() .jdbc(JDBC_URL, "javatest", connectionProperties); jdbcDF.show(); String url = JDBC_URL; Properties p = new Properties(); p.put("driver", "org.apache.phoenix.jdbc.PhoenixDriver"); //p.put("batchsize", "10"); jdbcDF.write().mode(SaveMode.Append).jdbc(url, "javatest", p); sparkSession.close(); } // Create variables } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105439#comment-16105439 ] Paul Wu commented on SPARK-17614: - Oh, sorry. I thought I could use a query hereas I do with other rdbms. Things become complicated for this Cassandra case after I think more on thisI'll accept your comment. > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Assignee: Sean Owen >Priority: Minor > Labels: cassandra-jdbc, sql > Fix For: 2.1.0 > > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105425#comment-16105425 ] Paul Wu commented on SPARK-17614: - So create a new issue? Or this is not an issue to you? > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Assignee: Sean Owen >Priority: Minor > Labels: cassandra-jdbc, sql > Fix For: 2.1.0 > > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105402#comment-16105402 ] Paul Wu edited comment on SPARK-17614 at 7/28/17 6:14 PM: -- The fix does not support the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp where empid>2)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} was (Author: zwu@gmail.com): The fix does not support the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Assignee: Sean Owen >Priority: Minor > Labels: cassandra-jdbc, sql > Fix For: 2.1.0 > > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) >
[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105402#comment-16105402 ] Paul Wu edited comment on SPARK-17614 at 7/28/17 6:09 PM: -- The fix does not support the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} was (Author: zwu@gmail.com): The fix does not support the syntax on the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Assignee: Sean Owen >Priority: Minor > Labels: cassandra-jdbc, sql > Fix For: 2.1.0 > > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) >
[jira] [Reopened] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu reopened SPARK-17614: - The fix does not support the syntax on the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Assignee: Sean Owen >Priority: Minor > Labels: cassandra-jdbc, sql > Fix For: 2.1.0 > > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105402#comment-16105402 ] Paul Wu edited comment on SPARK-17614 at 7/28/17 6:08 PM: -- The fix does not support the syntax on the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} was (Author: zwu@gmail.com): The fix does not support the syntax on the syntax like this: {{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}} Here is the stack trace: {{Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * from [select]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}} > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Assignee: Sean Owen >Priority: Minor > Labels: cassandra-jdbc, sql > Fix For: 2.1.0 > > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at >
[jira] [Comment Edited] (SPARK-19296) Awkward changes for JdbcUtils.saveTable in Spark 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831033#comment-15831033 ] Paul Wu edited comment on SPARK-19296 at 1/20/17 9:52 PM: -- We found this Util is very useful in general (much, much better than primitive jdbc) and have been using it since 1.5.x... didn't realize it is internal. It will a big regret for us to not be able to use it. But it seems it is a pain for us now. I guess for code quality purpose, at least refactor the code to eliminate the duplication args. was (Author: zwu@gmail.com): We found this Util is very useful in general (much, much better than primitive jdbc) and have been using it since 1.3.x... didn't realize it is internal. It will a big regret for us to not be able to use it. But it seems it is a pain for us now. I guess for code quality purpose, at least refactor the code to eliminate the duplication args. > Awkward changes for JdbcUtils.saveTable in Spark 2.1.0 > -- > > Key: SPARK-19296 > URL: https://issues.apache.org/jira/browse/SPARK-19296 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Paul Wu >Priority: Minor > > The Change from JdbcUtils.saveTable(DataFrame, String, String, Property) to > JdbcUtils.saveTable(DataFrame, String, String, JDBCOptions), not only > incompatible to previous versions (so the previous code in java won't > compile, but also introduced silly code change: One has to specify url and > table twice like this: > JDBCOptions jdbcOptions = new JDBCOptions(url, table, map); > JdbcUtils.saveTable(ds, url, table,jdbcOptions); > Why does one have to supply the same things ulr, table twice? (If you don't > specify it in both places, the exception will be thrown). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19296) Awkward changes for JdbcUtils.saveTable in Spark 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831033#comment-15831033 ] Paul Wu commented on SPARK-19296: - We found this Util is very useful in general (much, much better than primitive jdbc) and have been using it since 1.3.x... didn't realize it is internal. It will a big regret for us to not be able to use it. But it seems it is a pain for us now. I guess for code quality purpose, at least refactor the code to eliminate the duplication args. > Awkward changes for JdbcUtils.saveTable in Spark 2.1.0 > -- > > Key: SPARK-19296 > URL: https://issues.apache.org/jira/browse/SPARK-19296 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Paul Wu >Priority: Minor > > The Change from JdbcUtils.saveTable(DataFrame, String, String, Property) to > JdbcUtils.saveTable(DataFrame, String, String, JDBCOptions), not only > incompatible to previous versions (so the previous code in java won't > compile, but also introduced silly code change: One has to specify url and > table twice like this: > JDBCOptions jdbcOptions = new JDBCOptions(url, table, map); > JdbcUtils.saveTable(ds, url, table,jdbcOptions); > Why does one have to supply the same things ulr, table twice? (If you don't > specify it in both places, the exception will be thrown). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19296) Awkward changes for JdbcUtils.saveTable in Spark 2.1.0
Paul Wu created SPARK-19296: --- Summary: Awkward changes for JdbcUtils.saveTable in Spark 2.1.0 Key: SPARK-19296 URL: https://issues.apache.org/jira/browse/SPARK-19296 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Paul Wu Priority: Minor The Change from JdbcUtils.saveTable(DataFrame, String, String, Property) to JdbcUtils.saveTable(DataFrame, String, String, JDBCOptions), not only incompatible to previous versions (so the previous code in java won't compile, but also introduced silly code change: One has to specify url and table twice like this: JDBCOptions jdbcOptions = new JDBCOptions(url, table, map); JdbcUtils.saveTable(ds, url, table,jdbcOptions); Why does one have to supply the same things ulr, table twice? (If you don't specify it in both places, the exception will be thrown). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue
[ https://issues.apache.org/jira/browse/SPARK-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-18123: Description: Blindly quoting every field name for inserting is the issue (Line 110-119, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala). /** * Returns a PreparedStatement that inserts a row into table via conn. */ def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect) : PreparedStatement = { val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") val placeholders = rddSchema.fields.map(_ => "?").mkString(",") val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" conn.prepareStatement(sql) } This code causes the following issue (it does not happen to 1.6.x): I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a dataset to Oracle database, but the fields must be uppercase to succeed. This is not an expected behavior: If only the table names were quoted, this utility should concern the case sensitivity. The code below throws the exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid identifier. String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) DATETIME_gmt, '1' NODEB"; hc.sql("set spark.sql.caseSensitive=false"); Dataset ds = hc.sql(detailSQL); ds.show(); org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, detailTable, p); was: Blindly quoting every field name for inserting is the issue (Line 110-119, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala). /** * Returns a PreparedStatement that inserts a row into table via conn. */ def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect) : PreparedStatement = { val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") val placeholders = rddSchema.fields.map(_ => "?").mkString(",") val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" conn.prepareStatement(sql) } This code causes the following issue (it does not happen to 1.6.x): I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a dataset to Oracle database, but the fields must be uppercase to succeed. This is not a expect behavior: If only the table names were quoted, this utility should concern the case sensitivity. The code below throws the exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid identifier. String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) DATETIME_gmt, '1' NODEB"; hc.sql("set spark.sql.caseSensitive=false"); Dataset ds = hc.sql(detailSQL); ds.show(); org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, detailTable, p); > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case > senstivity issue > -- > > Key: SPARK-18123 > URL: https://issues.apache.org/jira/browse/SPARK-18123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Paul Wu > > Blindly quoting every field name for inserting is the issue (Line 110-119, > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala). > /** >* Returns a PreparedStatement that inserts a row into table via conn. >*/ > def insertStatement(conn: Connection, table: String, rddSchema: StructType, > dialect: JdbcDialect) > : PreparedStatement = { > val columns = rddSchema.fields.map(x => > dialect.quoteIdentifier(x.name)).mkString(",") > val placeholders = rddSchema.fields.map(_ => "?").mkString(",") > val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" > conn.prepareStatement(sql) > } > This code causes the following issue (it does not happen to 1.6.x): > I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a > dataset to Oracle database, but the fields must be uppercase to succeed. This > is not an expected behavior: If only the table names were quoted, this > utility should concern the case sensitivity. The code below throws the > exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: > "DATETIME_gmt": invalid identifier. > String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) > DATETIME_gmt, '1' NODEB"; > hc.sql("set spark.sql.caseSensitive=false"); > Dataset
[jira] [Commented] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue
[ https://issues.apache.org/jira/browse/SPARK-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610739#comment-15610739 ] Paul Wu commented on SPARK-18123: - Just tried if it worked for the issue. But after I found the code, I knew this is a bug. The line does nothing useful,but it does not hurt either. > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case > senstivity issue > -- > > Key: SPARK-18123 > URL: https://issues.apache.org/jira/browse/SPARK-18123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Paul Wu > > Blindly quoting every field name for inserting is the issue (Line 110-119, > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala). > /** >* Returns a PreparedStatement that inserts a row into table via conn. >*/ > def insertStatement(conn: Connection, table: String, rddSchema: StructType, > dialect: JdbcDialect) > : PreparedStatement = { > val columns = rddSchema.fields.map(x => > dialect.quoteIdentifier(x.name)).mkString(",") > val placeholders = rddSchema.fields.map(_ => "?").mkString(",") > val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" > conn.prepareStatement(sql) > } > This code causes the following issue (it does not happen to 1.6.x): > I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a > dataset to Oracle database, but the fields must be uppercase to succeed. This > is not a expect behavior: If only the table names were quoted, this utility > should concern the case sensitivity. The code below throws the exception: > Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": > invalid identifier. > String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) > DATETIME_gmt, '1' NODEB"; > hc.sql("set spark.sql.caseSensitive=false"); > Dataset ds = hc.sql(detailSQL); > ds.show(); > > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, > detailTable, p); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue
[ https://issues.apache.org/jira/browse/SPARK-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-18123: Description: Blindly quoting every field name for inserting is the issue (Line 110-119, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala). /** * Returns a PreparedStatement that inserts a row into table via conn. */ def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect) : PreparedStatement = { val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") val placeholders = rddSchema.fields.map(_ => "?").mkString(",") val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" conn.prepareStatement(sql) } This code causes the following issue (it does not happen to 1.6.x): I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a dataset to Oracle database, but the fields must be uppercase to succeed. This is not a expect behavior: If only the table names were quoted, this utility should concern the case sensitivity. The code below throws the exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid identifier. String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) DATETIME_gmt, '1' NODEB"; hc.sql("set spark.sql.caseSensitive=false"); Dataset ds = hc.sql(detailSQL); ds.show(); org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, detailTable, p); was: I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a dataset to Oracle database, but the fields must be uppercase to succeed. This is not a expect behavior: If only the table names were quoted, this utility should concern the case sensitivity. The code below throws the exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid identifier. String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) DATETIME_gmt, '1' NODEB"; hc.sql("set spark.sql.caseSensitive=false"); Dataset ds = hc.sql(detailSQL); ds.show(); org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, detailTable, p); > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case > senstivity issue > -- > > Key: SPARK-18123 > URL: https://issues.apache.org/jira/browse/SPARK-18123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Paul Wu > > Blindly quoting every field name for inserting is the issue (Line 110-119, > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala). > /** >* Returns a PreparedStatement that inserts a row into table via conn. >*/ > def insertStatement(conn: Connection, table: String, rddSchema: StructType, > dialect: JdbcDialect) > : PreparedStatement = { > val columns = rddSchema.fields.map(x => > dialect.quoteIdentifier(x.name)).mkString(",") > val placeholders = rddSchema.fields.map(_ => "?").mkString(",") > val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" > conn.prepareStatement(sql) > } > This code causes the following issue (it does not happen to 1.6.x): > I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a > dataset to Oracle database, but the fields must be uppercase to succeed. This > is not a expect behavior: If only the table names were quoted, this utility > should concern the case sensitivity. The code below throws the exception: > Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": > invalid identifier. > String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) > DATETIME_gmt, '1' NODEB"; > hc.sql("set spark.sql.caseSensitive=false"); > Dataset ds = hc.sql(detailSQL); > ds.show(); > > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, > detailTable, p); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue
Paul Wu created SPARK-18123: --- Summary: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue Key: SPARK-18123 URL: https://issues.apache.org/jira/browse/SPARK-18123 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.1 Reporter: Paul Wu I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a dataset to Oracle database, but the fields must be uppercase to succeed. This is not a expect behavior: If only the table names were quoted, this utility should concern the case sensitivity. The code below throws the exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid identifier. String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) DATETIME_gmt, '1' NODEB"; hc.sql("set spark.sql.caseSensitive=false"); Dataset ds = hc.sql(detailSQL); ds.show(); org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, detailTable, p); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-17614: Comment: was deleted (was: Create pull request: https://github.com/apache/spark/pull/15183) > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510709#comment-15510709 ] Paul Wu commented on SPARK-17614: - Create pull request: https://github.com/apache/spark/pull/15183 > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510626#comment-15510626 ] Paul Wu edited comment on SPARK-17614 at 9/21/16 5:42 PM: -- No, Custom JdbcDialect won't resolve the problem since DataFrameReader uses JDBCRDD and the later has a hard code line val statement = conn.prepareStatement(s"SELECT * FROM $table WHERE 1=0") for getting the table existence. See line 61 at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala Line 61 needs to use the Dialect's "table existence" rather than hard-coded the query there. was (Author: zwu@gmail.com): No, Custom JdbcDialect won't resolve the problem since DataFrameReader uses JDBCRDD and the later has a hard code line val statement = conn.prepareStatement(s"SELECT * FROM $table WHERE 1=0") for getting the table existence. See line 61 at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-17614: Priority: Major (was: Minor) > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510626#comment-15510626 ] Paul Wu commented on SPARK-17614: - No, Custom JdbcDialect won't resolve the problem since DataFrameReader uses JDBCRDD and the later has a hard code line val statement = conn.prepareStatement(s"SELECT * FROM $table WHERE 1=0") for getting the table existence. See line 61 at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Priority: Minor > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510525#comment-15510525 ] Paul Wu commented on SPARK-17614: - Thanks. I tried to register my custom dialect as following, but it does not reach the getTableExistsQuery() method. Could anyone help? import org.apache.spark.sql.jdbc.JdbcDialect; public class NRSCassandraDialect extends JdbcDialect { @Override public boolean canHandle(String url) { System.out.println("came here.."+ url.startsWith("jdbc:cassandra")); return url.startsWith("jdbc:cassandra"); } @Override public String getTableExistsQuery (String table) { System.out.println("query?"); return "SELECT * from " + table + " LIMIT 1"; } } -- public class CassJDBC implements Serializable { private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(CassJDBC.class); private static final String _CONNECTION_URL = "jdbc:cassandra://ulpd326..com/test?loadbalancing=DCAwareRoundRobinPolicy(%22datacenter1%22)"; private static final String _USERNAME = ""; private static final String _PWD = ""; private static final SparkSession sparkSession = SparkSession.builder() .config("spark.sql.warehouse.dir", "file:///home/zw251y/tmp").master("local[*]").appName("Spark2JdbcDs").getOrCreate(); public static void main(String[] args) { JdbcDialects.registerDialect(new NRSCassandraDialect()); final Properties connectionProperties = new Properties(); final String dbTable= "sql_demo"; Dataset jdbcDF = sparkSession.read() .jdbc(_CONNECTION_URL, dbTable, connectionProperties); jdbcDF.show(); } } Error message: came here..true parameters = "datacenter1" Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu >Priority: Minor > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
[ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507950#comment-15507950 ] Paul Wu commented on SPARK-17614: - Work around: Rebuild the Cassandra JDBC wrapper by modifying CassandraPreparedStatement.java at (https://github.com/adejanovski/cassandra-jdbc-wrapper/blob/master/src/main/java/com/github/adejanovski/cassandra/jdbc/CassandraPreparedStatement.java, pulled 09/20/2016) . Add the following 2 lines before line 87: this.cql = cql.replace("WHERE 1=0", "limit 1"); cql = this.cql; > sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra > does not support > - > > Key: SPARK-17614 > URL: https://issues.apache.org/jira/browse/SPARK-17614 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Any Spark Runtime >Reporter: Paul Wu > Labels: cassandra-jdbc, sql > > I have the code like the following with Cassandra JDBC > (https://github.com/adejanovski/cassandra-jdbc-wrapper): > final String dbTable= "sql_demo"; > Dataset jdbcDF > = sparkSession.read() > .jdbc(CASSANDRA_CONNECTION_URL, dbTable, > connectionProperties); > List rows = jdbcDF.collectAsList(); > It threw the error: > Exception in thread "main" java.sql.SQLTransientException: > com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable > alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) > at > com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) > at > com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) > The reason is that the Spark jdbc code uses the sql syntax "where 1=0" > somewhere (to get the schema?), but Cassandra does not support this syntax. > Not sure how this issue can be resolved...this is because CQL is not standard > sql. > The following log shows more information: > 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; > Rack: %s > 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM > sql_demo WHERE 1=0 > 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] > com.datastax.driver.core.Statement$1@41ccb3b9 > 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
Paul Wu created SPARK-17614: --- Summary: sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support Key: SPARK-17614 URL: https://issues.apache.org/jira/browse/SPARK-17614 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Environment: Any Spark Runtime Reporter: Paul Wu I have the code like the following with Cassandra JDBC (https://github.com/adejanovski/cassandra-jdbc-wrapper): final String dbTable= "sql_demo"; Dataset jdbcDF = sparkSession.read() .jdbc(CASSANDRA_CONNECTION_URL, dbTable, connectionProperties); List rows = jdbcDF.collectAsList(); It threw the error: Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...) at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348) at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48) The reason is that the Spark jdbc code uses the sql syntax "where 1=0" somewhere (to get the schema?), but Cassandra does not support this syntax. Not sure how this issue can be resolved...this is because CQL is not standard sql. The following log shows more information: 16/09/20 13:16:35 INFO CassandraConnection 138: Datacenter: %s; Host: %s; Rack: %s 16/09/20 13:16:35 TRACE CassandraPreparedStatement 98: CQL: SELECT * FROM sql_demo WHERE 1=0 16/09/20 13:16:35 TRACE RequestHandler 71: [19400322] com.datastax.driver.core.Statement$1@41ccb3b9 16/09/20 13:16:35 TRACE RequestHandler 272: [19400322-1] Starting -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
[ https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-9255: --- Attachment: (was: timestamp_bug.zip) Timestamp handling incorrect for Spark 1.4.1 on Linux - Key: SPARK-9255 URL: https://issues.apache.org/jira/browse/SPARK-9255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release. Reporter: Paul Wu Attachments: timestamp_bug2.zip, tstest This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422) at
[jira] [Commented] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
[ https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641175#comment-14641175 ] Paul Wu commented on SPARK-9255: Related https://issues.apache.org/jira/browse/SPARK-9058, but I doubt it is the same. If you look at the sample code, it has almost no aggregations at all. However, the fix for https://issues.apache.org/jira/browse/SPARK-9058 may also fix this issue. I guess you can test it. Timestamp handling incorrect for Spark 1.4.1 on Linux - Key: SPARK-9255 URL: https://issues.apache.org/jira/browse/SPARK-9255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0, 1.4.1 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release. Reporter: Paul Wu Attachments: timestamp_bug2.zip, tstest Updates: This issue is due to the following config: spark.sql.codegen true If this param is set to be false, the problem does not happen. The bug was introduced in 1.4.0. Releases 1.3.0 and 1.3.1 have no this issue. === This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
[jira] [Commented] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
[ https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639949#comment-14639949 ] Paul Wu commented on SPARK-9255: [~srowen] I don't think it is due to version difference: The same code runs on Release 1.3.0 correctly on Red Linux. This bug was introduced after 1.3.0. Timestamp handling incorrect for Spark 1.4.1 on Linux - Key: SPARK-9255 URL: https://issues.apache.org/jira/browse/SPARK-9255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release. Reporter: Paul Wu Attachments: timestamp_bug.zip This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429) at
[jira] [Updated] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
[ https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-9255: --- Attachment: timestamp_bug.zip the project can run without issues. But when it is deployed to the Timestamp handling incorrect for Spark 1.4.1 on Linux - Key: SPARK-9255 URL: https://issues.apache.org/jira/browse/SPARK-9255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release. Reporter: Paul Wu Attachments: timestamp_bug.zip This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.3.1 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422) at
[jira] [Created] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
Paul Wu created SPARK-9255: -- Summary: Timestamp handling incorrect for Spark 1.4.1 on Linux Key: SPARK-9255 URL: https://issues.apache.org/jira/browse/SPARK-9255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release. Reporter: Paul Wu This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.3.1 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.liftedTree2$1(ToolBoxFactory.scala:355) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:355) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.compile(ToolBoxFactory.scala:422) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.eval(ToolBoxFactory.scala:444) at
[jira] [Updated] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
[ https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-9255: --- Description: This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.liftedTree2$1(ToolBoxFactory.scala:355) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:355) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.compile(ToolBoxFactory.scala:422) at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.eval(ToolBoxFactory.scala:444) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:74) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:26) at
[jira] [Comment Edited] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux
[ https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637336#comment-14637336 ] Paul Wu edited comment on SPARK-9255 at 7/22/15 6:52 PM: - The project can run on windows without issues. But when it is deployed to the 1.4.1 release on Linux, it has the error as described the issue. However, it works on Windows 7 for the same release 1.4.1: 15/07/22 11:43:33 INFO DAGScheduler: Job 1 finished: show at TimestampSample.java:50, took 0.784688 s ++ | p| ++ |2015-07-22 11:00:...| ++ was (Author: zwu@gmail.com): the project can run without issues. But when it is deployed to the Timestamp handling incorrect for Spark 1.4.1 on Linux - Key: SPARK-9255 URL: https://issues.apache.org/jira/browse/SPARK-9255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release. Reporter: Paul Wu Attachments: timestamp_bug.zip This is a very strange case involving timestamp I can run the program on Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from Apache without issues , but when I ran it on Spark 1.4.1 release either downloaded from Apache or the version built with scala 2.11 on redhat linux, it has the following error (the code I used is after this stack trace): 15/07/22 12:02:50 ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0) java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261) at org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: value is not a member of TimestampType.this.InternalType at
[jira] [Closed] (SPARK-9087) Broken SQL on where condition involving timestamp and time string.
[ https://issues.apache.org/jira/browse/SPARK-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu closed SPARK-9087. -- Resolution: Fixed Fix Version/s: 1.4.1 1.4.1 fixed the issue. Broken SQL on where condition involving timestamp and time string. --- Key: SPARK-9087 URL: https://issues.apache.org/jira/browse/SPARK-9087 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Paul Wu Priority: Critical Labels: sparksql Fix For: 1.4.1 Suppose mytable has a field called greenwich, which is in timestamp type. The table is registered through a java bean. The following code used to work in 1.3 and 1.3.1, now it is broken: there are records having time newer 01/14/2015 , but it now returns nothing. This is a block issue for us. Is there any workaround? SELECT * FROM mytable WHEREgreenwich '2015-01-14' -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9087) Broken SQL on where condition involving timestamp and time string.
Paul Wu created SPARK-9087: -- Summary: Broken SQL on where condition involving timestamp and time string. Key: SPARK-9087 URL: https://issues.apache.org/jira/browse/SPARK-9087 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Paul Wu Priority: Critical Suppose mytable has a field called greenwich, which is in timestamp type. The table is registered through a java bean. The following code used to work in 1.3 and 1.3.1, now it is broken: there are records having time newer 01/14/2015 , but it now returns nothing. This is a block issue for us. Is there any workaround? SELECT * FROM mytable WHEREgreenwich '2015-01-14' -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly
[ https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555680#comment-14555680 ] Paul Wu commented on SPARK-7804: Unfortunately, JdbcRDD was poorly designed since the lowerbound and upperbound are long types which are too limited. One of my team member implemented a general one based on the idea. Some of my team are worried about the home-made solution. When we saw JDBCRDD, it looks like what we wanted. In fact, I hope JDBCRDD can be public or JdbcRDD can be re-designed to take care general situation just like what JDBCRDD does. Incorrect results from JDBCRDD -- one record repeatly - Key: SPARK-7804 URL: https://issues.apache.org/jira/browse/SPARK-7804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1 Reporter: Paul Wu Getting only one record repeated in the RDD and repeated field value: I have a table like: {code} attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com {code} My code: {code} JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop), schema, USERS, new String[]{attuid, name, email}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println(count before to Java RDD= + jdbcRDD.cache().count()); JavaRDDRow jrdd = jdbcRDD.toJavaRDD(); System.out.println(count= + jrdd.count()); ListRow lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii r.length(); ii++) { System.out.println(r.getString(ii)); } } {code} === result is : {code} 34 tony t...@appp.com 34 tony t...@appp.com 34 tony t...@appp.com {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly
[ https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555680#comment-14555680 ] Paul Wu edited comment on SPARK-7804 at 5/22/15 12:00 PM: -- Unfortunately, JdbcRDD was poorly designed since the lowerbound and upperbound are long types which are too limited. One of my team members implemented a general one based on the idea. Some of my team are worried about the home-made solution. When we saw JDBCRDD, it looks like what we wanted. In fact, I hope JDBCRDD can be public or JdbcRDD can be re-designed to take care general situation just like what JDBCRDD does. was (Author: zwu@gmail.com): Unfortunately, JdbcRDD was poorly designed since the lowerbound and upperbound are long types which are too limited. One of my team member implemented a general one based on the idea. Some of my team are worried about the home-made solution. When we saw JDBCRDD, it looks like what we wanted. In fact, I hope JDBCRDD can be public or JdbcRDD can be re-designed to take care general situation just like what JDBCRDD does. Incorrect results from JDBCRDD -- one record repeatly - Key: SPARK-7804 URL: https://issues.apache.org/jira/browse/SPARK-7804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1 Reporter: Paul Wu Getting only one record repeated in the RDD and repeated field value: I have a table like: {code} attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com {code} My code: {code} JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop), schema, USERS, new String[]{attuid, name, email}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println(count before to Java RDD= + jdbcRDD.cache().count()); JavaRDDRow jrdd = jdbcRDD.toJavaRDD(); System.out.println(count= + jrdd.count()); ListRow lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii r.length(); ii++) { System.out.println(r.getString(ii)); } } {code} === result is : {code} 34 tony t...@appp.com 34 tony t...@appp.com 34 tony t...@appp.com {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly and incorrect field value
Paul Wu created SPARK-7804: -- Summary: Incorrect results from JDBCRDD -- one record repeatly and incorrect field value Key: SPARK-7804 URL: https://issues.apache.org/jira/browse/SPARK-7804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1, 1.3.0 Reporter: Paul Wu Getting only one record repeated in the RDD and repeated field value: I have a table like: attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com My code: JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop), schema, USERS, new String[]{attuid, name, email}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println(count before to Java RDD= + jdbcRDD.cache().count()); JavaRDDRow jrdd = jdbcRDD.toJavaRDD(); System.out.println(count= + jrdd.count()); ListRow lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii r.length(); ii++) { System.out.println(r.getString(ii)); } } === result is : 34 34 t...@appp.com 34 34 t...@appp.com 34 34 t...@appp.com -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly
[ https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-7804: --- Description: Getting only one record repeated in the RDD and repeated field value: I have a table like: attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com My code: JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop), schema, USERS, new String[]{attuid, name, email}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println(count before to Java RDD= + jdbcRDD.cache().count()); JavaRDDRow jrdd = jdbcRDD.toJavaRDD(); System.out.println(count= + jrdd.count()); ListRow lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii r.length(); ii++) { System.out.println(r.getString(ii)); } } === result is : 34 tony t...@appp.com 34 tony t...@appp.com 34 tony t...@appp.com was: Getting only one record repeated in the RDD and repeated field value: I have a table like: attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com My code: JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop), schema, USERS, new String[]{attuid, name, email}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println(count before to Java RDD= + jdbcRDD.cache().count()); JavaRDDRow jrdd = jdbcRDD.toJavaRDD(); System.out.println(count= + jrdd.count()); ListRow lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii r.length(); ii++) { System.out.println(r.getString(ii)); } } === result is : 34 34 t...@appp.com 34 34 t...@appp.com 34 34 t...@appp.com Summary: Incorrect results from JDBCRDD -- one record repeatly (was: Incorrect results from JDBCRDD -- one record repeatly and incorrect field value ) Incorrect results from JDBCRDD -- one record repeatly - Key: SPARK-7804 URL: https://issues.apache.org/jira/browse/SPARK-7804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1 Reporter: Paul Wu Labels: JDBCRDD, sql Getting only one record repeated in the RDD and repeated field value: I have a table like: attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com My code: JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType,
[jira] [Commented] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly
[ https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555602#comment-14555602 ] Paul Wu commented on SPARK-7804: Thanks -- you are right. The cache() was a problem and also I cannot use ListRow lr = jrdd.collect();. But jrdd.foreach((Row r) - { System.out.println(r.get(0) + . + r.get(1) + + r.get(2)); }); or foreachParition will work. We really wanted to use DataFrame, however it does not have the partition options that we really need to improve the performance. Using this class, we can take the advantage of sending multiple query to each db partition at the same time. By as you said this is the internal code (from JAVA DOC, I cannot see it), I'm not sure what I can do now. I guess you guys can close this ticket. Thanks again! Incorrect results from JDBCRDD -- one record repeatly - Key: SPARK-7804 URL: https://issues.apache.org/jira/browse/SPARK-7804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1 Reporter: Paul Wu Labels: JDBCRDD, sql Getting only one record repeated in the RDD and repeated field value: I have a table like: {code} attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com {code} My code: {code} JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = ; java.util.Properties prop = new Properties(); ListJDBCPartition partitionList = new ArrayList(); //int i; partitionList.add(new JDBCPartition(1=1, 0)); ListStructField fields = new ArrayListStructField(); fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(name, DataTypes.StringType, true)); fields.add(DataTypes.createStructField(email, DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop), schema, USERS, new String[]{attuid, name, email}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println(count before to Java RDD= + jdbcRDD.cache().count()); JavaRDDRow jrdd = jdbcRDD.toJavaRDD(); System.out.println(count= + jrdd.count()); ListRow lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii r.length(); ii++) { System.out.println(r.getString(ii)); } } {code} === result is : {code} 34 tony t...@appp.com 34 tony t...@appp.com 34 tony t...@appp.com {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7746) SetFetchSize for JDBCRDD's prepareStatement
Paul Wu created SPARK-7746: -- Summary: SetFetchSize for JDBCRDD's prepareStatement Key: SPARK-7746 URL: https://issues.apache.org/jira/browse/SPARK-7746 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.3.1, 1.3.0 Reporter: Paul Wu The prepareStatement created internal to compute() method in JDBCRDD should have some options that the setFetchSize() can be set. We found that the parameter can affect Oracle DB tremendously. With current implementation, we have no way to set the size. One of my team did his implementation, and we found 10X improvement by setting the size to a proper number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env
[ https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498305#comment-14498305 ] Paul Wu commented on SPARK-6936: You are right: I used spark-hive_2.10 instead of spark-hive_2.11 in my building . Sorry I deleted my comment before reading your comment. Thanks, SQLContext.sql() caused deadlock in multi-thread env Key: SPARK-6936 URL: https://issues.apache.org/jira/browse/SPARK-6936 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Environment: JDK 1.8.x, RedHat Linux version 2.6.32-431.23.3.el6.x86_64 (mockbu...@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 Reporter: Paul Wu Labels: deadlock, sql, threading Doing (the same query) in more than one threads with SQLConext.sql may lead to deadlock. Here is a way to reproduce it (since this is multi-thread issue, the reproduction may or may not be so easy). 1. Register a relatively big table. 2. Create two different classes and in the classes, do the same query in a method and put the results in a set and print out the set size. 3. Create two threads to use an object from each class in the run method. Start the threads. For my tests, it can have a deadlock just in a few runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env
[ https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498247#comment-14498247 ] Paul Wu commented on SPARK-6936: Not sure about HiveContext. I tried to do the following program and I got exception (env: JDK 1.8/ Spark 1.3). Why did I get the error on HiveContext? -- exec-maven-plugin:1.2.1:exec (default-cli) @ Spark-Sample --- Exception in thread main java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object; at org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:574) at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:245) at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41) at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38) at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138) at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138) at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96) at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38) at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:234) at org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92) at org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92) at
[jira] [Issue Comment Deleted] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env
[ https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-6936: --- Comment: was deleted (was: Not sure about HiveContext. I tried to do the following program and I got exception (env: JDK 1.8/ Spark 1.3). Why did I get the error on HiveContext? -- exec-maven-plugin:1.2.1:exec (default-cli) @ Spark-Sample --- Exception in thread main java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object; at org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:574) at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:245) at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41) at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38) at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138) at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138) at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96) at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38) at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:234) at org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92) at org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92) at
[jira] [Created] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env
Paul Wu created SPARK-6936: -- Summary: SQLContext.sql() caused deadlock in multi-thread env Key: SPARK-6936 URL: https://issues.apache.org/jira/browse/SPARK-6936 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Environment: JDK 1.8.x, RedHat Linux version 2.6.32-431.23.3.el6.x86_64 (mockbu...@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 Reporter: Paul Wu Doing (the same query) in more than one threads with SQLConext.sql may lead to deadlock. Here is a way to reproduce it (since this is multi-thread issue, the reproduction may or may not be so easy). 1. Register a relatively big table. 2. Create two different classes and in the classes, do the same query in a method and put the results in a set and print out the set size. 3. Create two threads to use an object from each class in the run method. Start the threads. For my tests, it can have a deadlock just in a few runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env
[ https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496933#comment-14496933 ] Paul Wu commented on SPARK-6936: 1. The query is something like this (sorry since the data is proprietary -- I cannot release it here). SELECT distinct timezoneind, css.rnc, css.field1,localtime, greenwich FROM cs , css WHERE cs.r = css.r and cs.name = css.name and greenwich '2015-03-04' 2. Maven dependency: dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.11/artifactId version1.3.0/version scope compile/scope /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-sql_2.11/artifactId version1.3.0/version scope compile/scope /dependency 3. Here is the trace I reproduced on Windows 7/ C:java -version java version 1.8.0_25 Java(TM) SE Runtime Environment (build 1.8.0_25-b18) Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) = 2015-04-15 13:46:13 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode): DestroyJavaVM #98 prio=5 os_prio=0 tid=0x5dc8c800 nid=0x2048 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Thread-34 #97 prio=5 os_prio=0 tid=0x61df6000 nid=0x14b0 runnable [0x646ea000] java.lang.Thread.State: RUNNABLE at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40) at scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136) at scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132) at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.get(HashMap.scala:70) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:186) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) at scala.util.parsing.combinator.syntactical.StdTokenParsers$class.keyword(StdTokenParsers.scala:37) at scala.util.parsing.combinator.syntactical.StandardTokenParsers.keyword(StandardTokenParsers.scala:29) at org.apache.spark.sql.catalyst.SqlParser$$anonfun$primary$1$$anonfun$apply$287$$anonfun$apply$288.apply(SqlParser.scala:373) at org.apache.spark.sql.catalyst.SqlParser$$anonfun$primary$1$$anonfun$apply$287$$anonfun$apply$288.apply(SqlParser.scala:373) at scala.util.parsing.combinator.Parsers$Parser.p$lzycompute$2(Parsers.scala:267) - locked 0xdc33a648 (a scala.util.parsing.combinator.Parsers$$anon$3) at scala.util.parsing.combinator.Parsers$Parser.scala$util$parsing$combinator$Parsers$Parser$$p$3(Parsers.scala:267) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$$tilde$1.apply(Parsers.scala:268) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$$tilde$1.apply(Parsers.scala:268) at scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(Parsers.scala:143) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:234) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:234) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) at
[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env
[ https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497267#comment-14497267 ] Paul Wu commented on SPARK-6936: Currently, I synchronized the part, which seems to be working so far. SQLContext.sql() caused deadlock in multi-thread env Key: SPARK-6936 URL: https://issues.apache.org/jira/browse/SPARK-6936 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Environment: JDK 1.8.x, RedHat Linux version 2.6.32-431.23.3.el6.x86_64 (mockbu...@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 Reporter: Paul Wu Labels: deadlock, sql, threading Doing (the same query) in more than one threads with SQLConext.sql may lead to deadlock. Here is a way to reproduce it (since this is multi-thread issue, the reproduction may or may not be so easy). 1. Register a relatively big table. 2. Create two different classes and in the classes, do the same query in a method and put the results in a set and print out the set size. 3. Create two threads to use an object from each class in the run method. Start the threads. For my tests, it can have a deadlock just in a few runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org