[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2019-10-01 Thread Paul Wu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090
 ] 

Paul Wu edited comment on SPARK-20427 at 10/1/19 4:13 PM:
--

Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed that one 
does not particularly concern about the exact significant digits in his/her 
applications.
{code:java}
.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 Also, some may think their issues come from the .write() operation, but the 
issues are in fact from the .read() operation. The col1, col2...column names 
are not necessarily from the original tables. They could be the calculated 
fields for output in the queries.   One could mistakenly bark at a wrong place 
to try to fix the issues. 

 


was (Author: zwu@gmail.com):
Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed that one 
does not particularly concern about the exact significant digits in his/her 
applications.
{code:java}
.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2019-10-01 Thread Paul Wu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090
 ] 

Paul Wu edited comment on SPARK-20427 at 10/1/19 3:49 PM:
--

Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed that one 
does not particularly concern about the exact significant digits in his/her 
applications.
{code:java}
.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 


was (Author: zwu@gmail.com):
Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed one does not 
particularly concern about the exact significant digits in his/her 
applications. 
{code:java}

.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2019-10-01 Thread Paul Wu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090
 ] 

Paul Wu commented on SPARK-20427:
-

Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed one does not 
particularly concern about the exact significant digits in his/her 
applications. 
{code:java}

.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27077) DataFrameReader and Number of Connection Limitation

2019-03-06 Thread Paul Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-27077:

Description: 
I am not very sure this is a Spark core issue or a Vertica issue, however I 
intended to think this is Spark's issue.  The problem is that when we try to 
read with sparkSession.read.load from some datasource, in my case, Vertica DB, 
the DataFrameReader needs to make some 'large' number of  initial  jdbc 
connection requests. My account limits I can only use 16 (and I can see at 
least 6 of them can be used for my loading), and when the "large" number of the 
requests issued, I got exception below.  In fact, I can see eventually it could 
settle with fewer numbers of connections (in my case 2 simultaneous 
DataFrameReader). So I think we should have a parameter that prevents the 
reader from sending out initial "bigger" number of connection requests than 
user's limit. If we don't have this option parameter, my app could fail 
randomly due to my Vertica account's number of connections allowed.

 

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New 
session rejected because connection limit of 16 on database already met for 
M21176
     at com.vertica.util.ServerErrorData.buildException(Unknown Source)
     at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
     at com.vertica.io.ProtocolStream.initSession(Unknown Source)
     at com.vertica.core.VConnection.tryConnect(Unknown Source)
     at com.vertica.core.VConnection.connect(Unknown Source)
     at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
Source)
     at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
     at java.sql.DriverManager.getConnection(DriverManager.java:664)
     at java.sql.DriverManager.getConnection(DriverManager.java:208)
     at 
com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
     at 
com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
     at 
com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
     at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
     at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
     at 
com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156)
 Caused by: com.vertica.support.exceptions.NonTransientConnectionException: 
[Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 
16 on databas    e already met for

 

  was:
I am not very sure this is a Spark core issue or a Vertica issue, however I 
intended to think this is Spark's issue.  The problem is that when we try to 
read with sparkSession.read.load from some datasource, in my case, Vertica DB, 
the DataFrameReader needs to make some 'large' number of  initial  jdbc 
connection requests. My account limits I can only use 16 (and I can see at 
least 6 of them can be used for my loading), and when the "large" number of the 
requests issued, I got exception below.  In fact, I can see eventually it could 
settle with fewer numbers of connections (in my case 2 simultaneous 
DataFrameReader). So I think we should have a parameter that prevents the 
reader to send out initial "bigger" number of connection requests than user's 
limit. If we don't have this option parameter, my app could fail randomly due 
to my Vertica account's number of connections allowed.

 

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New 
session rejected because connection limit of 16 on database already met for 
M21176
    at com.vertica.util.ServerErrorData.buildException(Unknown Source)
    at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
    at com.vertica.io.ProtocolStream.initSession(Unknown Source)
    at com.vertica.core.VConnection.tryConnect(Unknown Source)
    at com.vertica.core.VConnection.connect(Unknown Source)
    at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
Source)
    at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at 
com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
    at 
com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
    at 
com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
    at 

[jira] [Created] (SPARK-27077) DataFrameReader and Number of Connection Limitation

2019-03-06 Thread Paul Wu (JIRA)
Paul Wu created SPARK-27077:
---

 Summary: DataFrameReader and Number of Connection Limitation
 Key: SPARK-27077
 URL: https://issues.apache.org/jira/browse/SPARK-27077
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.3.2
Reporter: Paul Wu


I am not very sure this is a Spark core issue or a Vertica issue, however I 
intended to think this is Spark's issue.  The problem is that when we try to 
read with sparkSession.read.load from some datasource, in my case, Vertica DB, 
the DataFrameReader needs to make some 'large' number of  initial  jdbc 
connection requests. My account limits I can only use 16 (and I can see at 
least 6 of them can be used for my loading), and when the "large" number of the 
requests issued, I got exception below.  In fact, I can see eventually it could 
settle with fewer numbers of connections (in my case 2 simultaneous 
DataFrameReader). So I think we should have a parameter that prevents the 
reader to send out initial "bigger" number of connection requests than user's 
limit. If we don't have this option parameter, my app could fail randomly due 
to my Vertica account's number of connections allowed.

 

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New 
session rejected because connection limit of 16 on database already met for 
M21176
    at com.vertica.util.ServerErrorData.buildException(Unknown Source)
    at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
    at com.vertica.io.ProtocolStream.initSession(Unknown Source)
    at com.vertica.core.VConnection.tryConnect(Unknown Source)
    at com.vertica.core.VConnection.connect(Unknown Source)
    at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
Source)
    at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at 
com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
    at 
com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
    at 
com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
    at 
com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156)
Caused by: com.vertica.support.exceptions.NonTransientConnectionException: 
[Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 
16 on databas    e already met for

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22371) dag-scheduler-event-loop thread stopped with error Attempted to access garbage collected accumulator 5605982

2018-05-07 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466099#comment-16466099
 ] 

Paul Wu commented on SPARK-22371:
-

Got the same problem with 2.3 and also the program stalled:

{{ Uncaught exception in thread heartbeat-receiver-event-loop-thread}}
{{java.lang.IllegalStateException: Attempted to access garbage collected 
accumulator 8825}}
{{    at 
org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:265)}}
{{    at 
org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:261)}}
{{    at scala.Option.map(Option.scala:146)}}
{{    at 
org.apache.spark.util.AccumulatorContext$.get(AccumulatorV2.scala:261)}}
{{    at 
org.apache.spark.util.AccumulatorV2$$anonfun$name$1.apply(AccumulatorV2.scala:87)}}
{{    at 
org.apache.spark.util.AccumulatorV2$$anonfun$name$1.apply(AccumulatorV2.scala:87)}}
{{    at scala.Option.orElse(Option.scala:289)}}
{{    at org.apache.spark.util.AccumulatorV2.name(AccumulatorV2.scala:87)}}
{{    at 
org.apache.spark.util.AccumulatorV2.toInfo(AccumulatorV2.scala:108)}}

> dag-scheduler-event-loop thread stopped with error  Attempted to access 
> garbage collected accumulator 5605982
> -
>
> Key: SPARK-22371
> URL: https://issues.apache.org/jira/browse/SPARK-22371
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Mayank Agarwal
>Priority: Major
> Attachments: Helper.scala, ShuffleIssue.java, 
> driver-thread-dump-spark2.1.txt, sampledata
>
>
> Our Spark Jobs are getting stuck on DagScheduler.runJob as dagscheduler 
> thread is stopped because of *Attempted to access garbage collected 
> accumulator 5605982*.
> from our investigation it look like accumulator is cleaned by GC first and 
> same accumulator is used for merging the results from executor on task 
> completion event.
> As the error java.lang.IllegalAccessError is LinkageError which is treated as 
> FatalError so dag-scheduler loop is finished with below exception.
> ---ERROR stack trace --
> Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: 
> Attempted to access garbage collected accumulator 5605982
>   at 
> org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:253)
>   at 
> org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:249)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.AccumulatorContext$.get(AccumulatorV2.scala:249)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1083)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1080)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1080)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1183)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1647)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> I am attaching the thread dump of driver as well 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23617) Register a Function without params with Spark SQL Java API

2018-03-06 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu resolved SPARK-23617.
-
   Resolution: Duplicate
Fix Version/s: 2.3.0

As commented by Hyukjin Kwon, the issue is duplicated and has been fixed in 
2.3.0.

> Register a Function without params with Spark SQL Java API
> --
>
> Key: SPARK-23617
> URL: https://issues.apache.org/jira/browse/SPARK-23617
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.2.1
>Reporter: Paul Wu
>Priority: Major
> Fix For: 2.3.0
>
>
> One can register a function using Scala:
> {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString)}}
> Now, if I use Java API:
> {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString());}}
> The code does not compile. Define UDF0 for Java API?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23617) Register a Function without params with Spark SQL Java API

2018-03-06 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-23617:

Description: 
One can register a function using Scala:

{{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString)}}

Now, if I use Java API:

{{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString());}}

The code does not compile. Define UDF0 for Java API?

  was:
One can register a function using Scala:

spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) 

Now, if I use Java API:

spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); 

The code does not compile. Define UDF0 for Java API?


> Register a Function without params with Spark SQL Java API
> --
>
> Key: SPARK-23617
> URL: https://issues.apache.org/jira/browse/SPARK-23617
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.2.1
>Reporter: Paul Wu
>Priority: Major
>
> One can register a function using Scala:
> {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString)}}
> Now, if I use Java API:
> {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString());}}
> The code does not compile. Define UDF0 for Java API?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23617) Register a Function without params with Spark SQL Java API

2018-03-06 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-23617:

Description: 
One can register a function using Scala:

spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) 

Now, if I use Java API:

spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); 

The code does not compile. Define UDF0 for Java API?

  was:
One can register a function using Scala:

{{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }}

Now, if I use Java API:

{{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }}

The code does not compile. Define UDF0 for Java API?


> Register a Function without params with Spark SQL Java API
> --
>
> Key: SPARK-23617
> URL: https://issues.apache.org/jira/browse/SPARK-23617
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.2.1
>Reporter: Paul Wu
>Priority: Major
>
> One can register a function using Scala:
> spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) 
> Now, if I use Java API:
> spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); 
> The code does not compile. Define UDF0 for Java API?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23617) Register a Function without params with Spark SQL Java API

2018-03-06 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-23617:

Description: 
One can register a function using Scala:

{{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }}

Now, if I use Java API:

{{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }}

The code does not compile. Define UDF0 for Java API?

  was:
One can register a function using Scala:

{{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }}

Now, if I use Java API:

{{ spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }}

The code does not compile. Define UDF0 for Java API?


> Register a Function without params with Spark SQL Java API
> --
>
> Key: SPARK-23617
> URL: https://issues.apache.org/jira/browse/SPARK-23617
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.2.1
>Reporter: Paul Wu
>Priority: Major
>
> One can register a function using Scala:
> {{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }}
> Now, if I use Java API:
> {{spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }}
> The code does not compile. Define UDF0 for Java API?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23617) Register a Function without params with Spark SQL Java API

2018-03-06 Thread Paul Wu (JIRA)
Paul Wu created SPARK-23617:
---

 Summary: Register a Function without params with Spark SQL Java API
 Key: SPARK-23617
 URL: https://issues.apache.org/jira/browse/SPARK-23617
 Project: Spark
  Issue Type: Improvement
  Components: Java API, SQL
Affects Versions: 2.2.1
Reporter: Paul Wu


One can register a function using Scala:

{{spark.udf.register("uuid", ()=>java.util.UUID.randomUUID.toString) }}

Now, if I use Java API:

{{ spark.udf().register("uuid", ()=>java.util.UUID.randomUUID().toString()); }}

The code does not compile. Define UDF0 for Java API?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23193) Insert into Spark Table statement cannot specify column names

2018-01-23 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-23193:

Summary: Insert into Spark Table statement cannot specify column names  
(was: Insert into Spark Table cannot specify column names)

> Insert into Spark Table statement cannot specify column names
> -
>
> Key: SPARK-23193
> URL: https://issues.apache.org/jira/browse/SPARK-23193
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Paul Wu
>Priority: Major
>
> The following code shows the insert statement cannot specify the column 
> names. The error is
> {{insert into aaa (age, name)  values (12, 'nn')}}
> {{-^^^}}
> {{    at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)}}
> {{    at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)}}
> {{    at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)}}
> {{    at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)}}
> {{    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)}}
> {{    at com.---.iqi.spk.TestInsert.main(TestInsert.java:75)}}
>  
> {{public class TestInsert {}}
> {{    public static final SparkSession spark}}
> {{    = SparkSession.builder()}}
> {{    .config("spark.sql.warehouse.dir", "file:///temp")}}
> {{    .config("spark.driver.memory", "5g")}}
> {{    .enableHiveSupport()}}
> {{    
> .master("local[*]").appName("Spark2JdbcDs").getOrCreate();}}
> {{     }}
> {{    public static class Person implements Serializable {}}
> {{    private String name;}}
> {{    private int age;}}
> {{    public String getName() {}}
> {{    return name;}}
> {{    }}}
> {{    public void setName(String name) {}}
> {{    this.name = name;}}
> {{    }}}
> {{    public int getAge() {}}
> {{    return age;}}
> {{    }}}
> {{    public void setAge(int age) {}}
> {{    this.age = age;}}
> {{    }}}
> {{    }}}
> {{    public static void main(String[] args) throws Exception {}}
> {{    // Create an instance of a Bean class}}
> {{    Person person = new Person();}}
> {{    person.setName("Andy");}}
> {{    person.setAge(32);}}
> {{// Encoders are created for Java beans}}
> {{    Encoder personEncoder = Encoders.bean(Person.class);}}
> {{    Dataset javaBeanDS = spark.createDataset(}}
> {{    Collections.singletonList(person),}}
> {{    personEncoder}}
> {{    );}}
> {{    javaBeanDS.show();}}
> {{    javaBeanDS.createTempView("Abc");}}
> {{    spark.sql("drop table aaa");}}
> {{    spark.sql("create table aaa as select * from abc");}}
> {{    spark.sql("insert into aaa  values (12, 'nn')");}}
> {{    spark.sql("select * from aaa").show();}}
> {{    spark.sql("insert into aaa (age, name)  values (12, 'nn')");}}
> {{    spark.sql("select * from aaa").show();}}
> {{    }}
> {{    spark.close();}}
> {{ }}
> {{    }}}
> {{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23193) Insert into Spark Table cannot specify column names

2018-01-23 Thread Paul Wu (JIRA)
Paul Wu created SPARK-23193:
---

 Summary: Insert into Spark Table cannot specify column names
 Key: SPARK-23193
 URL: https://issues.apache.org/jira/browse/SPARK-23193
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.1
Reporter: Paul Wu


The following code shows the insert statement cannot specify the column names. 
The error is

{{insert into aaa (age, name)  values (12, 'nn')}}
{{-^^^}}

{{    at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)}}
{{    at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)}}
{{    at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)}}
{{    at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)}}
{{    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)}}
{{    at com.---.iqi.spk.TestInsert.main(TestInsert.java:75)}}

 

{{public class TestInsert {}}

{{    public static final SparkSession spark}}
{{    = SparkSession.builder()}}
{{    .config("spark.sql.warehouse.dir", "file:///temp")}}
{{    .config("spark.driver.memory", "5g")}}
{{    .enableHiveSupport()}}
{{    
.master("local[*]").appName("Spark2JdbcDs").getOrCreate();}}

{{     }}

{{    public static class Person implements Serializable {}}

{{    private String name;}}
{{    private int age;}}

{{    public String getName() {}}
{{    return name;}}
{{    }}}

{{    public void setName(String name) {}}
{{    this.name = name;}}
{{    }}}

{{    public int getAge() {}}
{{    return age;}}
{{    }}}

{{    public void setAge(int age) {}}
{{    this.age = age;}}
{{    }}}
{{    }}}

{{    public static void main(String[] args) throws Exception {}}

{{    // Create an instance of a Bean class}}
{{    Person person = new Person();}}
{{    person.setName("Andy");}}
{{    person.setAge(32);}}

{{// Encoders are created for Java beans}}
{{    Encoder personEncoder = Encoders.bean(Person.class);}}
{{    Dataset javaBeanDS = spark.createDataset(}}
{{    Collections.singletonList(person),}}
{{    personEncoder}}
{{    );}}
{{    javaBeanDS.show();}}
{{    javaBeanDS.createTempView("Abc");}}
{{    spark.sql("drop table aaa");}}
{{    spark.sql("create table aaa as select * from abc");}}
{{    spark.sql("insert into aaa  values (12, 'nn')");}}
{{    spark.sql("select * from aaa").show();}}
{{    spark.sql("insert into aaa (age, name)  values (12, 'nn')");}}
{{    spark.sql("select * from aaa").show();}}
{{    }}
{{    spark.close();}}
{{ }}
{{    }}}
{{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21740) DataFrame.write does not work with Phoenix JDBC Driver

2017-08-15 Thread Paul Wu (JIRA)
Paul Wu created SPARK-21740:
---

 Summary: DataFrame.write does not work with Phoenix JDBC Driver
 Key: SPARK-21740
 URL: https://issues.apache.org/jira/browse/SPARK-21740
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0, 2.0.0
Reporter: Paul Wu


  The reason for this is that Phoenix JDBC driver does not support "INSERT", 
but "UPSERT".
Exception for the following program:
17/08/15 12:18:53 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.phoenix.exception.PhoenixParserException: ERROR 601 (42P00): Syntax 
error. Encountered "INSERT" at line 1, column 1.
at 
org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)

{code:java}
public class HbaseJDBCSpark {

private static final SparkSession sparkSession
= SparkSession.builder()
.config("spark.sql.warehouse.dir", "file:///temp")
.config("spark.driver.memory", "5g")
.master("local[*]").appName("Spark2JdbcDs").getOrCreate();

static final String JDBC_URL
= "jdbc:phoenix:somehost:2181:/hbase-unsecure";

public static void main(String[] args) {
final Properties connectionProperties = new Properties();
Dataset jdbcDF
= sparkSession.read()
.jdbc(JDBC_URL, "javatest", connectionProperties);

jdbcDF.show();

String url = JDBC_URL;
Properties p = new Properties();
p.put("driver", "org.apache.phoenix.jdbc.PhoenixDriver");
//p.put("batchsize", "10");
jdbcDF.write().mode(SaveMode.Append).jdbc(url, "javatest", p);
sparkSession.close();

}
// Create variables

}
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2017-07-28 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105439#comment-16105439
 ] 

Paul Wu commented on SPARK-17614:
-

Oh, sorry.  I thought I could use a query hereas I do with other rdbms. 
Things become complicated for this Cassandra case after I think more on 
thisI'll accept your comment. 

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Assignee: Sean Owen
>Priority: Minor
>  Labels: cassandra-jdbc, sql
> Fix For: 2.1.0
>
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2017-07-28 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105425#comment-16105425
 ] 

Paul Wu commented on SPARK-17614:
-

So create a new issue? Or this is not an issue to you?

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Assignee: Sean Owen
>Priority: Minor
>  Labels: cassandra-jdbc, sql
> Fix For: 2.1.0
>
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2017-07-28 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105402#comment-16105402
 ] 

Paul Wu edited comment on SPARK-17614 at 7/28/17 6:14 PM:
--

The fix does not support the syntax  like this:

{{.jdbc(JDBC_URL, "(select * from emp where empid>2)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}


was (Author: zwu@gmail.com):
The fix does not support the syntax  like this:

{{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Assignee: Sean Owen
>Priority: Minor
>  Labels: cassandra-jdbc, sql
> Fix For: 2.1.0
>
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   

[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2017-07-28 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105402#comment-16105402
 ] 

Paul Wu edited comment on SPARK-17614 at 7/28/17 6:09 PM:
--

The fix does not support the syntax  like this:

{{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}


was (Author: zwu@gmail.com):
The fix does not support the syntax on the syntax like this:

{{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Assignee: Sean Owen
>Priority: Minor
>  Labels: cassandra-jdbc, sql
> Fix For: 2.1.0
>
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   

[jira] [Reopened] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2017-07-28 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu reopened SPARK-17614:
-

The fix does not support the syntax on the syntax like this:

{{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Assignee: Sean Owen
>Priority: Minor
>  Labels: cassandra-jdbc, sql
> Fix For: 2.1.0
>
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2017-07-28 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105402#comment-16105402
 ] 

Paul Wu edited comment on SPARK-17614 at 7/28/17 6:08 PM:
--

The fix does not support the syntax on the syntax like this:

{{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}


was (Author: zwu@gmail.com):
The fix does not support the syntax on the syntax like this:

{{.jdbc(JDBC_URL, "(select * from emp)", connectionProperties);}}

Here is the stack trace:

{{Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable 
alternative at input 'select' (SELECT * from [select]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
at 
com.att.cass.proto.CassJDBCWithSpark.main(CassJDBCWithSpark.java:44)}}

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Assignee: Sean Owen
>Priority: Minor
>  Labels: cassandra-jdbc, sql
> Fix For: 2.1.0
>
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> 

[jira] [Comment Edited] (SPARK-19296) Awkward changes for JdbcUtils.saveTable in Spark 2.1.0

2017-01-20 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831033#comment-15831033
 ] 

Paul Wu edited comment on SPARK-19296 at 1/20/17 9:52 PM:
--

We found this Util is very useful in general (much, much better than primitive 
jdbc) and have been using it since 1.5.x... didn't realize it is internal. It 
will a big regret for us to not be able to use it. But it seems it is a pain 
for us now. 

I guess for code quality purpose, at least refactor the code to eliminate the 
duplication args. 


was (Author: zwu@gmail.com):
We found this Util is very useful in general (much, much better than primitive 
jdbc) and have been using it since 1.3.x... didn't realize it is internal. It 
will a big regret for us to not be able to use it. But it seems it is a pain 
for us now. 

I guess for code quality purpose, at least refactor the code to eliminate the 
duplication args. 

> Awkward changes for JdbcUtils.saveTable in Spark 2.1.0
> --
>
> Key: SPARK-19296
> URL: https://issues.apache.org/jira/browse/SPARK-19296
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Paul Wu
>Priority: Minor
>
> The Change from JdbcUtils.saveTable(DataFrame, String, String, Property) to  
> JdbcUtils.saveTable(DataFrame, String, String, JDBCOptions), not only 
> incompatible to previous versions (so the previous code in java won't 
> compile, but also introduced silly code change: One has to specify url and 
> table  twice like this:
> JDBCOptions jdbcOptions = new JDBCOptions(url, table, map);
> JdbcUtils.saveTable(ds, url, table,jdbcOptions);
> Why does one have to supply the same things ulr, table twice? (If you don't 
> specify it in both places, the exception will be thrown).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19296) Awkward changes for JdbcUtils.saveTable in Spark 2.1.0

2017-01-19 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831033#comment-15831033
 ] 

Paul Wu commented on SPARK-19296:
-

We found this Util is very useful in general (much, much better than primitive 
jdbc) and have been using it since 1.3.x... didn't realize it is internal. It 
will a big regret for us to not be able to use it. But it seems it is a pain 
for us now. 

I guess for code quality purpose, at least refactor the code to eliminate the 
duplication args. 

> Awkward changes for JdbcUtils.saveTable in Spark 2.1.0
> --
>
> Key: SPARK-19296
> URL: https://issues.apache.org/jira/browse/SPARK-19296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Paul Wu
>Priority: Minor
>
> The Change from JdbcUtils.saveTable(DataFrame, String, String, Property) to  
> JdbcUtils.saveTable(DataFrame, String, String, JDBCOptions), not only 
> incompatible to previous versions (so the previous code in java won't 
> compile, but also introduced silly code change: One has to specify url and 
> table  twice like this:
> JDBCOptions jdbcOptions = new JDBCOptions(url, table, map);
> JdbcUtils.saveTable(ds, url, table,jdbcOptions);
> Why does one have to supply the same things ulr, table twice? (If you don't 
> specify it in both places, the exception will be thrown).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19296) Awkward changes for JdbcUtils.saveTable in Spark 2.1.0

2017-01-19 Thread Paul Wu (JIRA)
Paul Wu created SPARK-19296:
---

 Summary: Awkward changes for JdbcUtils.saveTable in Spark 2.1.0
 Key: SPARK-19296
 URL: https://issues.apache.org/jira/browse/SPARK-19296
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Paul Wu
Priority: Minor


The Change from JdbcUtils.saveTable(DataFrame, String, String, Property) to  
JdbcUtils.saveTable(DataFrame, String, String, JDBCOptions), not only 
incompatible to previous versions (so the previous code in java won't compile, 
but also introduced silly code change: One has to specify url and table  twice 
like this:

JDBCOptions jdbcOptions = new JDBCOptions(url, table, map);
JdbcUtils.saveTable(ds, url, table,jdbcOptions);

Why does one have to supply the same things ulr, table twice? (If you don't 
specify it in both places, the exception will be thrown).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue

2016-10-26 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-18123:

Description: 
Blindly quoting every field name for inserting is the issue (Line 110-119, 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala).


/**
   * Returns a PreparedStatement that inserts a row into table via conn.
   */
  def insertStatement(conn: Connection, table: String, rddSchema: StructType, 
dialect: JdbcDialect)
  : PreparedStatement = {
val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
conn.prepareStatement(sql)
  }

This code causes the following issue (it does not happen to 1.6.x):

I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
dataset to Oracle database, but the fields must be uppercase to succeed. This 
is not an expected behavior: If only the table names were quoted, this utility 
should concern the case sensitivity.  The code below throws the exception: 
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid 
identifier. 


String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
DATETIME_gmt, '1' NODEB";
hc.sql("set spark.sql.caseSensitive=false");
Dataset ds = hc.sql(detailSQL);
ds.show();
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, 
url, detailTable, p);

  was:
Blindly quoting every field name for inserting is the issue (Line 110-119, 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala).


/**
   * Returns a PreparedStatement that inserts a row into table via conn.
   */
  def insertStatement(conn: Connection, table: String, rddSchema: StructType, 
dialect: JdbcDialect)
  : PreparedStatement = {
val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
conn.prepareStatement(sql)
  }

This code causes the following issue (it does not happen to 1.6.x):

I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
dataset to Oracle database, but the fields must be uppercase to succeed. This 
is not a expect behavior: If only the table names were quoted, this utility 
should concern the case sensitivity.  The code below throws the exception: 
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid 
identifier. 


String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
DATETIME_gmt, '1' NODEB";
hc.sql("set spark.sql.caseSensitive=false");
Dataset ds = hc.sql(detailSQL);
ds.show();
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, 
url, detailTable, p);


> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable  the case 
> senstivity issue
> --
>
> Key: SPARK-18123
> URL: https://issues.apache.org/jira/browse/SPARK-18123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Paul Wu
>
> Blindly quoting every field name for inserting is the issue (Line 110-119, 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala).
> /**
>* Returns a PreparedStatement that inserts a row into table via conn.
>*/
>   def insertStatement(conn: Connection, table: String, rddSchema: StructType, 
> dialect: JdbcDialect)
>   : PreparedStatement = {
> val columns = rddSchema.fields.map(x => 
> dialect.quoteIdentifier(x.name)).mkString(",")
> val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
> val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
> conn.prepareStatement(sql)
>   }
> This code causes the following issue (it does not happen to 1.6.x):
> I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
> dataset to Oracle database, but the fields must be uppercase to succeed. This 
> is not an expected behavior: If only the table names were quoted, this 
> utility should concern the case sensitivity.  The code below throws the 
> exception: Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: 
> "DATETIME_gmt": invalid identifier. 
> String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
> DATETIME_gmt, '1' NODEB";
> hc.sql("set spark.sql.caseSensitive=false");
> Dataset 

[jira] [Commented] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue

2016-10-26 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610739#comment-15610739
 ] 

Paul Wu commented on SPARK-18123:
-

Just tried if it worked for the issue. But after I found the code, I knew this 
is a bug. The line does nothing  useful,but it  does not hurt either.

> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable  the case 
> senstivity issue
> --
>
> Key: SPARK-18123
> URL: https://issues.apache.org/jira/browse/SPARK-18123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Paul Wu
>
> Blindly quoting every field name for inserting is the issue (Line 110-119, 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala).
> /**
>* Returns a PreparedStatement that inserts a row into table via conn.
>*/
>   def insertStatement(conn: Connection, table: String, rddSchema: StructType, 
> dialect: JdbcDialect)
>   : PreparedStatement = {
> val columns = rddSchema.fields.map(x => 
> dialect.quoteIdentifier(x.name)).mkString(",")
> val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
> val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
> conn.prepareStatement(sql)
>   }
> This code causes the following issue (it does not happen to 1.6.x):
> I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
> dataset to Oracle database, but the fields must be uppercase to succeed. This 
> is not a expect behavior: If only the table names were quoted, this utility 
> should concern the case sensitivity.  The code below throws the exception: 
> Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": 
> invalid identifier. 
> String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
> DATETIME_gmt, '1' NODEB";
> hc.sql("set spark.sql.caseSensitive=false");
> Dataset ds = hc.sql(detailSQL);
> ds.show();
> 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, 
> detailTable, p);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue

2016-10-26 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-18123:

Description: 
Blindly quoting every field name for inserting is the issue (Line 110-119, 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala).


/**
   * Returns a PreparedStatement that inserts a row into table via conn.
   */
  def insertStatement(conn: Connection, table: String, rddSchema: StructType, 
dialect: JdbcDialect)
  : PreparedStatement = {
val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
conn.prepareStatement(sql)
  }

This code causes the following issue (it does not happen to 1.6.x):

I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
dataset to Oracle database, but the fields must be uppercase to succeed. This 
is not a expect behavior: If only the table names were quoted, this utility 
should concern the case sensitivity.  The code below throws the exception: 
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid 
identifier. 


String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
DATETIME_gmt, '1' NODEB";
hc.sql("set spark.sql.caseSensitive=false");
Dataset ds = hc.sql(detailSQL);
ds.show();
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, 
url, detailTable, p);

  was:
I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
dataset to Oracle database, but the fields must be uppercase to succeed. This 
is not a expect behavior: If only the table names were quoted, this utility 
should concern the case sensitivity.  The code below throws the exception: 
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid 
identifier. 

String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
DATETIME_gmt, '1' NODEB";
hc.sql("set spark.sql.caseSensitive=false");
Dataset ds = hc.sql(detailSQL);
ds.show();
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, 
url, detailTable, p);


> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable  the case 
> senstivity issue
> --
>
> Key: SPARK-18123
> URL: https://issues.apache.org/jira/browse/SPARK-18123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Paul Wu
>
> Blindly quoting every field name for inserting is the issue (Line 110-119, 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala).
> /**
>* Returns a PreparedStatement that inserts a row into table via conn.
>*/
>   def insertStatement(conn: Connection, table: String, rddSchema: StructType, 
> dialect: JdbcDialect)
>   : PreparedStatement = {
> val columns = rddSchema.fields.map(x => 
> dialect.quoteIdentifier(x.name)).mkString(",")
> val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
> val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
> conn.prepareStatement(sql)
>   }
> This code causes the following issue (it does not happen to 1.6.x):
> I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
> dataset to Oracle database, but the fields must be uppercase to succeed. This 
> is not a expect behavior: If only the table names were quoted, this utility 
> should concern the case sensitivity.  The code below throws the exception: 
> Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": 
> invalid identifier. 
> String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
> DATETIME_gmt, '1' NODEB";
> hc.sql("set spark.sql.caseSensitive=false");
> Dataset ds = hc.sql(detailSQL);
> ds.show();
> 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, 
> detailTable, p);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18123) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable the case senstivity issue

2016-10-26 Thread Paul Wu (JIRA)
Paul Wu created SPARK-18123:
---

 Summary: 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable  the case 
senstivity issue
 Key: SPARK-18123
 URL: https://issues.apache.org/jira/browse/SPARK-18123
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.1
Reporter: Paul Wu


I have issue with the saveTable method in Spark 2.0/2.0.1. I tried to save a 
dataset to Oracle database, but the fields must be uppercase to succeed. This 
is not a expect behavior: If only the table names were quoted, this utility 
should concern the case sensitivity.  The code below throws the exception: 
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "DATETIME_gmt": invalid 
identifier. 

String detailSQL ="select CAST('2016-09-25 17:00:00' AS TIMESTAMP) 
DATETIME_gmt, '1' NODEB";
hc.sql("set spark.sql.caseSensitive=false");
Dataset ds = hc.sql(detailSQL);
ds.show();
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, 
url, detailTable, p);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-21 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-17614:

Comment: was deleted

(was: Create pull request: https://github.com/apache/spark/pull/15183)

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-21 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510709#comment-15510709
 ] 

Paul Wu commented on SPARK-17614:
-

Create pull request: https://github.com/apache/spark/pull/15183

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-21 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510626#comment-15510626
 ] 

Paul Wu edited comment on SPARK-17614 at 9/21/16 5:42 PM:
--

No, Custom JdbcDialect won't resolve the problem since DataFrameReader uses 
JDBCRDD and the later has a hard code line 

val statement = conn.prepareStatement(s"SELECT * FROM $table WHERE 1=0")

for getting the table existence.  See line 61 at 

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

Line 61 needs to use the Dialect's "table existence" rather than hard-coded the 
query there.


was (Author: zwu@gmail.com):
No, Custom JdbcDialect won't resolve the problem since DataFrameReader uses 
JDBCRDD and the later has a hard code line 

val statement = conn.prepareStatement(s"SELECT * FROM $table WHERE 1=0")

for getting the table existence.  See line 61 at 

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-21 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-17614:

Priority: Major  (was: Minor)

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-21 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510626#comment-15510626
 ] 

Paul Wu commented on SPARK-17614:
-

No, Custom JdbcDialect won't resolve the problem since DataFrameReader uses 
JDBCRDD and the later has a hard code line 

val statement = conn.prepareStatement(s"SELECT * FROM $table WHERE 1=0")

for getting the table existence.  See line 61 at 

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Priority: Minor
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-21 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510525#comment-15510525
 ] 

Paul Wu commented on SPARK-17614:
-

Thanks. I tried to register my custom dialect as following, but it does not 
reach the getTableExistsQuery() method. Could anyone help?

import org.apache.spark.sql.jdbc.JdbcDialect;

public class NRSCassandraDialect  extends JdbcDialect {

@Override
public boolean canHandle(String url) {
System.out.println("came here.."+ url.startsWith("jdbc:cassandra"));
return url.startsWith("jdbc:cassandra");
}
@Override
public String getTableExistsQuery (String table) {
System.out.println("query?");
return "SELECT * from " + table + " LIMIT 1";
}
}

--
public class CassJDBC implements Serializable {

private static final org.apache.log4j.Logger LOGGER = 
org.apache.log4j.Logger.getLogger(CassJDBC.class);

private static final String _CONNECTION_URL = 
"jdbc:cassandra://ulpd326..com/test?loadbalancing=DCAwareRoundRobinPolicy(%22datacenter1%22)";
private static final String _USERNAME = "";
private static final String _PWD = "";

private static final SparkSession sparkSession
= SparkSession.builder() .config("spark.sql.warehouse.dir", 
"file:///home/zw251y/tmp").master("local[*]").appName("Spark2JdbcDs").getOrCreate();

public static void main(String[] args) {
   
JdbcDialects.registerDialect(new NRSCassandraDialect());
final Properties connectionProperties = new Properties();
 
final String dbTable= "sql_demo";

Dataset jdbcDF
= sparkSession.read()
.jdbc(_CONNECTION_URL, dbTable, connectionProperties);

jdbcDF.show();
   
}
}


Error message:
came here..true
parameters = "datacenter1"
Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>Priority: Minor
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-20 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507950#comment-15507950
 ] 

Paul Wu commented on SPARK-17614:
-

Work around: Rebuild the Cassandra JDBC wrapper by modifying  
CassandraPreparedStatement.java at 
(https://github.com/adejanovski/cassandra-jdbc-wrapper/blob/master/src/main/java/com/github/adejanovski/cassandra/jdbc/CassandraPreparedStatement.java,
 pulled 09/20/2016) . Add the following 2 lines before line 87:

this.cql = cql.replace("WHERE 1=0", "limit 1");
 cql = this.cql;



> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra 
> does not support
> -
>
> Key: SPARK-17614
> URL: https://issues.apache.org/jira/browse/SPARK-17614
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Any Spark Runtime 
>Reporter: Paul Wu
>  Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC 
> (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";
> Dataset jdbcDF
> = sparkSession.read()
> .jdbc(CASSANDRA_CONNECTION_URL, dbTable, 
> connectionProperties);
> List rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: 
> com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
> alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
>   at 
> com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
> somewhere (to get the schema?), but Cassandra does not support this syntax. 
> Not sure how this issue can be resolved...this is because CQL is not standard 
> sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
> Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
> sql_demo WHERE 1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
> com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-20 Thread Paul Wu (JIRA)
Paul Wu created SPARK-17614:
---

 Summary: sparkSession.read() .jdbc(***) use the sql syntax "where 
1=0" that Cassandra does not support
 Key: SPARK-17614
 URL: https://issues.apache.org/jira/browse/SPARK-17614
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
 Environment: Any Spark Runtime 
Reporter: Paul Wu


I have the code like the following with Cassandra JDBC 
(https://github.com/adejanovski/cassandra-jdbc-wrapper):

 final String dbTable= "sql_demo";
Dataset jdbcDF
= sparkSession.read()
.jdbc(CASSANDRA_CONNECTION_URL, dbTable, connectionProperties);
List rows = jdbcDF.collectAsList();

It threw the error:
Exception in thread "main" java.sql.SQLTransientException: 
com.datastax.driver.core.exceptions.SyntaxError: line 1:29 no viable 
alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
at 
com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.(CassandraPreparedStatement.java:108)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
at 
com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)

The reason is that the Spark jdbc code uses the sql syntax "where 1=0" 
somewhere (to get the schema?), but Cassandra does not support this syntax. Not 
sure how this issue can be resolved...this is because CQL is not standard sql. 

The following log shows more information:
16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; 
Rack: %s

16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM 
sql_demo WHERE 1=0
16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] 
com.datastax.driver.core.Statement$1@41ccb3b9
16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-24 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-9255:
---
Attachment: (was: timestamp_bug.zip)

 Timestamp handling incorrect for Spark 1.4.1 on Linux
 -

 Key: SPARK-9255
 URL: https://issues.apache.org/jira/browse/SPARK-9255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
Reporter: Paul Wu
 Attachments: timestamp_bug2.zip, tstest


 This is a very strange case involving timestamp  I can run the program on 
 Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from 
 Apache  without issues , but when I ran it on Spark 1.4.1 release either 
 downloaded from Apache or the version built with scala 2.11 on redhat linux, 
 it has the following error (the code I used is after this stack trace):
 15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 
 0)
 java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
 reflective compilation has failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
 at 
 org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
 at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
 at 
 org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
 at 
 org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
 at 
 org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has 
 failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422)
 at 
 

[jira] [Commented] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-24 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641175#comment-14641175
 ] 

Paul Wu commented on SPARK-9255:


Related https://issues.apache.org/jira/browse/SPARK-9058, but I doubt it is the 
same. If you look at the sample code, it has almost no aggregations at all.  
However, the fix for https://issues.apache.org/jira/browse/SPARK-9058 may also 
fix this issue. I guess you can test it. 

 Timestamp handling incorrect for Spark 1.4.1 on Linux
 -

 Key: SPARK-9255
 URL: https://issues.apache.org/jira/browse/SPARK-9255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.4.1
 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
Reporter: Paul Wu
 Attachments: timestamp_bug2.zip, tstest


 Updates: This issue is due to the following config: 
 spark.sql.codegen   true
 If this param is set to be false, the problem does not happen. The bug was 
 introduced in 1.4.0.  Releases 1.3.0 and 1.3.1 have no this issue.
 ===
 This is a very strange case involving timestamp  I can run the program on 
 Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from 
 Apache  without issues , but when I ran it on Spark 1.4.1 release either 
 downloaded from Apache or the version built with scala 2.11 on redhat linux, 
 it has the following error (the code I used is after this stack trace):
 15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 
 0)
 java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
 reflective compilation has failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
 at 
 org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
 at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
 at 
 org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
 at 
 org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
 at 
 org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has 
 failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
 

[jira] [Commented] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-23 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639949#comment-14639949
 ] 

Paul Wu commented on SPARK-9255:


[~srowen]   I don't think it is due to version difference: The same code runs 
on Release 1.3.0 correctly on Red Linux.  This bug was introduced after 1.3.0.  

 Timestamp handling incorrect for Spark 1.4.1 on Linux
 -

 Key: SPARK-9255
 URL: https://issues.apache.org/jira/browse/SPARK-9255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
Reporter: Paul Wu
 Attachments: timestamp_bug.zip


 This is a very strange case involving timestamp  I can run the program on 
 Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from 
 Apache  without issues , but when I ran it on Spark 1.4.1 release either 
 downloaded from Apache or the version built with scala 2.11 on redhat linux, 
 it has the following error (the code I used is after this stack trace):
 15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 
 0)
 java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
 reflective compilation has failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
 at 
 org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
 at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
 at 
 org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
 at 
 org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
 at 
 org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has 
 failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429)
 at 
 

[jira] [Updated] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-22 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-9255:
---
Attachment: timestamp_bug.zip

the project can run without issues. But when it is deployed to the 

 Timestamp handling incorrect for Spark 1.4.1 on Linux
 -

 Key: SPARK-9255
 URL: https://issues.apache.org/jira/browse/SPARK-9255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
Reporter: Paul Wu
 Attachments: timestamp_bug.zip


 This is a very strange case involving timestamp  I can run the program on 
 Windows using dev pom.xml (1.4.1) or 1.3.1 release downloaded from Apache  
 without issues , but when I ran it on Spark 1.4.1 release either downloaded 
 from Apache or the version built with scala 2.11 on redhat linux, it has the 
 following error (the code I used is after this stack trace):
 15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 
 0)
 java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
 reflective compilation has failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
 at 
 org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
 at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
 at 
 org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
 at 
 org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
 at 
 org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has 
 failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429)
 at 
 scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422)
 at 
 

[jira] [Created] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-22 Thread Paul Wu (JIRA)
Paul Wu created SPARK-9255:
--

 Summary: Timestamp handling incorrect for Spark 1.4.1 on Linux
 Key: SPARK-9255
 URL: https://issues.apache.org/jira/browse/SPARK-9255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
Reporter: Paul Wu


This is a very strange case involving timestamp  I can run the program on 
Windows using dev pom.xml (1.4.1) or 1.3.1 release downloaded from Apache  
without issues , but when I ran it on Spark 1.4.1 release either downloaded 
from Apache or the version built with scala 2.11 on redhat linux, it has the 
following error (the code I used is after this stack trace):

15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0)
java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
reflective compilation has failed:

value  is not a member of TimestampType.this.InternalType
at 
org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at 
org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at 
org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at 
org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at 
org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at 
org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at 
org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
at 
org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
at 
org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
at 
org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
at 
org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed:

value  is not a member of TimestampType.this.InternalType
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.liftedTree2$1(ToolBoxFactory.scala:355)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:355)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.compile(ToolBoxFactory.scala:422)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.eval(ToolBoxFactory.scala:444)
at 

[jira] [Updated] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-22 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-9255:
---
Description: 
This is a very strange case involving timestamp  I can run the program on 
Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from 
Apache  without issues , but when I ran it on Spark 1.4.1 release either 
downloaded from Apache or the version built with scala 2.11 on redhat linux, it 
has the following error (the code I used is after this stack trace):

15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 0)
java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
reflective compilation has failed:

value  is not a member of TimestampType.this.InternalType
at 
org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at 
org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at 
org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at 
org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at 
org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at 
org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at 
org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
at 
org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
at 
org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
at 
org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
at 
org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed:

value  is not a member of TimestampType.this.InternalType
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:198)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.liftedTree2$1(ToolBoxFactory.scala:355)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:355)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.compile(ToolBoxFactory.scala:422)
at 
scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.eval(ToolBoxFactory.scala:444)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:74)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:26)
at 

[jira] [Comment Edited] (SPARK-9255) Timestamp handling incorrect for Spark 1.4.1 on Linux

2015-07-22 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637336#comment-14637336
 ] 

Paul Wu edited comment on SPARK-9255 at 7/22/15 6:52 PM:
-

The project can run on windows without issues. But when it is deployed to the 
1.4.1 release on Linux, it has the error as described the issue.  However, it 
works on Windows 7 for the same release 1.4.1:
 
15/07/22 11:43:33 INFO DAGScheduler: Job 1 finished: show at 
TimestampSample.java:50, took 0.784688 s
++
|   p|
++
|2015-07-22 11:00:...|
++



was (Author: zwu@gmail.com):
the project can run without issues. But when it is deployed to the 

 Timestamp handling incorrect for Spark 1.4.1 on Linux
 -

 Key: SPARK-9255
 URL: https://issues.apache.org/jira/browse/SPARK-9255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
 Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
Reporter: Paul Wu
 Attachments: timestamp_bug.zip


 This is a very strange case involving timestamp  I can run the program on 
 Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from 
 Apache  without issues , but when I ran it on Spark 1.4.1 release either 
 downloaded from Apache or the version built with scala 2.11 on redhat linux, 
 it has the following error (the code I used is after this stack trace):
 15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 
 0)
 java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
 reflective compilation has failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
 at 
 org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
 at 
 org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at 
 org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
 at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
 at 
 org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
 at 
 org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
 at 
 org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
 at 
 org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
 at 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has 
 failed:
 value  is not a member of TimestampType.this.InternalType
 at 
 

[jira] [Closed] (SPARK-9087) Broken SQL on where condition involving timestamp and time string.

2015-07-16 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu closed SPARK-9087.
--
   Resolution: Fixed
Fix Version/s: 1.4.1

1.4.1 fixed the issue.

 Broken SQL on where condition involving timestamp and time string. 
 ---

 Key: SPARK-9087
 URL: https://issues.apache.org/jira/browse/SPARK-9087
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Paul Wu
Priority: Critical
  Labels: sparksql
 Fix For: 1.4.1


 Suppose mytable has a field called greenwich, which is in  timestamp type.  
 The table is registered through a java bean. The following code used to work 
 in 1.3 and 1.3.1, now it is broken: there are records having time newer 
 01/14/2015 , but it now returns nothing. This is a block issue for us. Is 
 there any workaround? 
 SELECT *  FROM mytable  WHEREgreenwich   '2015-01-14'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9087) Broken SQL on where condition involving timestamp and time string.

2015-07-15 Thread Paul Wu (JIRA)
Paul Wu created SPARK-9087:
--

 Summary: Broken SQL on where condition involving timestamp and 
time string. 
 Key: SPARK-9087
 URL: https://issues.apache.org/jira/browse/SPARK-9087
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Paul Wu
Priority: Critical


Suppose mytable has a field called greenwich, which is in  timestamp type.  The 
table is registered through a java bean. The following code used to work in 1.3 
and 1.3.1, now it is broken: there are records having time newer 01/14/2015 , 
but it now returns nothing. This is a block issue for us. Is there any 
workaround? 

SELECT *  FROM mytable  WHEREgreenwich   '2015-01-14'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly

2015-05-22 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555680#comment-14555680
 ] 

Paul Wu commented on SPARK-7804:


Unfortunately,  JdbcRDD was poorly designed since the lowerbound and upperbound 
are long types which are too limited.  One of my team member implemented a 
general one  based on the idea. Some of my team are worried about the home-made 
solution.  When we saw JDBCRDD, it looks like what we wanted. In fact, I hope 
JDBCRDD can be public or JdbcRDD can be  re-designed to take care general 
situation just like what JDBCRDD does.



 Incorrect results from JDBCRDD -- one record repeatly
 -

 Key: SPARK-7804
 URL: https://issues.apache.org/jira/browse/SPARK-7804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Paul Wu

 Getting only one  record repeated in the RDD and repeated field value:
  
 I have a table like:
 {code}
 attuid  name email
 12  john   j...@appp.com
 23  tom   t...@appp.com
 34  tony  t...@appp.com
 {code}
 My code:
 {code}
  JavaSparkContext sc = new JavaSparkContext(sparkConf);
 String url = ;
 java.util.Properties prop = new Properties();
 ListJDBCPartition partitionList = new ArrayList();
 //int i;
 partitionList.add(new JDBCPartition(1=1, 0));
 
 ListStructField fields = new ArrayListStructField();
 fields.add(DataTypes.createStructField(attuid, 
 DataTypes.StringType, true));
 fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
 true));
 fields.add(DataTypes.createStructField(email, DataTypes.StringType, 
 true));
 StructType schema = DataTypes.createStructType(fields);
 JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
 JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop),
  
 schema,
  USERS,
 new String[]{attuid, name, email},
 new Filter[]{ },
 
 partitionList.toArray(new JDBCPartition[0])
   
 );
 
 System.out.println(count before to Java RDD= + 
 jdbcRDD.cache().count());
 JavaRDDRow jrdd = jdbcRDD.toJavaRDD();
 System.out.println(count= + jrdd.count());
 ListRow lr = jrdd.collect();
 for (Row r : lr) {
 for (int ii = 0; ii  r.length(); ii++) {
 System.out.println(r.getString(ii));
 }
 }
 {code}
 ===
 result is :
 {code}
 34
 tony
  t...@appp.com
 34
 tony
  t...@appp.com
 34
 tony 
  t...@appp.com
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly

2015-05-22 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555680#comment-14555680
 ] 

Paul Wu edited comment on SPARK-7804 at 5/22/15 12:00 PM:
--

Unfortunately,  JdbcRDD was poorly designed since the lowerbound and upperbound 
are long types which are too limited.  One of my team members implemented a 
general one  based on the idea. Some of my team are worried about the home-made 
solution.  When we saw JDBCRDD, it looks like what we wanted. In fact, I hope 
JDBCRDD can be public or JdbcRDD can be  re-designed to take care general 
situation just like what JDBCRDD does.




was (Author: zwu@gmail.com):
Unfortunately,  JdbcRDD was poorly designed since the lowerbound and upperbound 
are long types which are too limited.  One of my team member implemented a 
general one  based on the idea. Some of my team are worried about the home-made 
solution.  When we saw JDBCRDD, it looks like what we wanted. In fact, I hope 
JDBCRDD can be public or JdbcRDD can be  re-designed to take care general 
situation just like what JDBCRDD does.



 Incorrect results from JDBCRDD -- one record repeatly
 -

 Key: SPARK-7804
 URL: https://issues.apache.org/jira/browse/SPARK-7804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Paul Wu

 Getting only one  record repeated in the RDD and repeated field value:
  
 I have a table like:
 {code}
 attuid  name email
 12  john   j...@appp.com
 23  tom   t...@appp.com
 34  tony  t...@appp.com
 {code}
 My code:
 {code}
  JavaSparkContext sc = new JavaSparkContext(sparkConf);
 String url = ;
 java.util.Properties prop = new Properties();
 ListJDBCPartition partitionList = new ArrayList();
 //int i;
 partitionList.add(new JDBCPartition(1=1, 0));
 
 ListStructField fields = new ArrayListStructField();
 fields.add(DataTypes.createStructField(attuid, 
 DataTypes.StringType, true));
 fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
 true));
 fields.add(DataTypes.createStructField(email, DataTypes.StringType, 
 true));
 StructType schema = DataTypes.createStructType(fields);
 JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
 JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop),
  
 schema,
  USERS,
 new String[]{attuid, name, email},
 new Filter[]{ },
 
 partitionList.toArray(new JDBCPartition[0])
   
 );
 
 System.out.println(count before to Java RDD= + 
 jdbcRDD.cache().count());
 JavaRDDRow jrdd = jdbcRDD.toJavaRDD();
 System.out.println(count= + jrdd.count());
 ListRow lr = jrdd.collect();
 for (Row r : lr) {
 for (int ii = 0; ii  r.length(); ii++) {
 System.out.println(r.getString(ii));
 }
 }
 {code}
 ===
 result is :
 {code}
 34
 tony
  t...@appp.com
 34
 tony
  t...@appp.com
 34
 tony 
  t...@appp.com
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly and incorrect field value

2015-05-21 Thread Paul Wu (JIRA)
Paul Wu created SPARK-7804:
--

 Summary: Incorrect results from JDBCRDD -- one record repeatly and 
incorrect field value 
 Key: SPARK-7804
 URL: https://issues.apache.org/jira/browse/SPARK-7804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1, 1.3.0
Reporter: Paul Wu


Getting only one  record repeated in the RDD and repeated field value:
 
I have a table like:
attuid  name email
12  john   j...@appp.com
23  tom   t...@appp.com
34  tony  t...@appp.com

My code:

 JavaSparkContext sc = new JavaSparkContext(sparkConf);

String url = ;

java.util.Properties prop = new Properties();

ListJDBCPartition partitionList = new ArrayList();

//int i;

partitionList.add(new JDBCPartition(1=1, 0));


ListStructField fields = new ArrayListStructField();
fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, 
true));
fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
true));
fields.add(DataTypes.createStructField(email, DataTypes.StringType, 
true));
StructType schema = DataTypes.createStructType(fields);
JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop),
 
schema,
 USERS,
new String[]{attuid, name, email},
new Filter[]{ },

partitionList.toArray(new JDBCPartition[0])
  
);


System.out.println(count before to Java RDD= + 
jdbcRDD.cache().count());
JavaRDDRow jrdd = jdbcRDD.toJavaRDD();
System.out.println(count= + jrdd.count());
ListRow lr = jrdd.collect();
for (Row r : lr) {
for (int ii = 0; ii  r.length(); ii++) {
System.out.println(r.getString(ii));
}
}
===
result is :
34
34 
 t...@appp.com
34
34 
 t...@appp.com
34
34 
 t...@appp.com




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly

2015-05-21 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-7804:
---
Description: 
Getting only one  record repeated in the RDD and repeated field value:
 
I have a table like:
attuid  name email
12  john   j...@appp.com
23  tom   t...@appp.com
34  tony  t...@appp.com

My code:

 JavaSparkContext sc = new JavaSparkContext(sparkConf);

String url = ;

java.util.Properties prop = new Properties();

ListJDBCPartition partitionList = new ArrayList();

//int i;

partitionList.add(new JDBCPartition(1=1, 0));


ListStructField fields = new ArrayListStructField();
fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, 
true));
fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
true));
fields.add(DataTypes.createStructField(email, DataTypes.StringType, 
true));
StructType schema = DataTypes.createStructType(fields);
JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop),
 
schema,
 USERS,
new String[]{attuid, name, email},
new Filter[]{ },

partitionList.toArray(new JDBCPartition[0])
  
);


System.out.println(count before to Java RDD= + 
jdbcRDD.cache().count());
JavaRDDRow jrdd = jdbcRDD.toJavaRDD();
System.out.println(count= + jrdd.count());
ListRow lr = jrdd.collect();
for (Row r : lr) {
for (int ii = 0; ii  r.length(); ii++) {
System.out.println(r.getString(ii));
}
}
===
result is :
34
tony
 t...@appp.com
34
tony
 t...@appp.com
34
tony 
 t...@appp.com


  was:
Getting only one  record repeated in the RDD and repeated field value:
 
I have a table like:
attuid  name email
12  john   j...@appp.com
23  tom   t...@appp.com
34  tony  t...@appp.com

My code:

 JavaSparkContext sc = new JavaSparkContext(sparkConf);

String url = ;

java.util.Properties prop = new Properties();

ListJDBCPartition partitionList = new ArrayList();

//int i;

partitionList.add(new JDBCPartition(1=1, 0));


ListStructField fields = new ArrayListStructField();
fields.add(DataTypes.createStructField(attuid, DataTypes.StringType, 
true));
fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
true));
fields.add(DataTypes.createStructField(email, DataTypes.StringType, 
true));
StructType schema = DataTypes.createStructType(fields);
JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop),
 
schema,
 USERS,
new String[]{attuid, name, email},
new Filter[]{ },

partitionList.toArray(new JDBCPartition[0])
  
);


System.out.println(count before to Java RDD= + 
jdbcRDD.cache().count());
JavaRDDRow jrdd = jdbcRDD.toJavaRDD();
System.out.println(count= + jrdd.count());
ListRow lr = jrdd.collect();
for (Row r : lr) {
for (int ii = 0; ii  r.length(); ii++) {
System.out.println(r.getString(ii));
}
}
===
result is :
34
34 
 t...@appp.com
34
34 
 t...@appp.com
34
34 
 t...@appp.com


Summary: Incorrect results from JDBCRDD -- one record repeatly  (was: 
Incorrect results from JDBCRDD -- one record repeatly and incorrect field value 
)

 Incorrect results from JDBCRDD -- one record repeatly
 -

 Key: SPARK-7804
 URL: https://issues.apache.org/jira/browse/SPARK-7804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Paul Wu
  Labels: JDBCRDD, sql

 Getting only one  record repeated in the RDD and repeated field value:
  
 I have a table like:
 attuid  name email
 12  john   j...@appp.com
 23  tom   t...@appp.com
 34  tony  t...@appp.com
 My code:
  JavaSparkContext sc = new JavaSparkContext(sparkConf);
 String url = ;
 java.util.Properties prop = new Properties();
 ListJDBCPartition partitionList = new ArrayList();
 //int i;
 partitionList.add(new JDBCPartition(1=1, 0));
 
 ListStructField fields = new ArrayListStructField();
 fields.add(DataTypes.createStructField(attuid, 
 DataTypes.StringType, true));
 fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
 true));
 fields.add(DataTypes.createStructField(email, DataTypes.StringType, 

[jira] [Commented] (SPARK-7804) Incorrect results from JDBCRDD -- one record repeatly

2015-05-21 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555602#comment-14555602
 ] 

Paul Wu commented on SPARK-7804:


Thanks -- you are right. The cache() was a problem and also I cannot use 
ListRow lr = jrdd.collect();. But 

jrdd.foreach((Row r) - {
 System.out.println(r.get(0) +  . + r.get(1) +   + r.get(2));
 });
or foreachParition will work. 

We really wanted to use DataFrame, however it does not have the partition 
options that we really need to improve the performance. Using this class, we 
can take the advantage of sending multiple query to each db partition at the 
same time. By as you said this is the internal code (from JAVA DOC, I cannot 
see it), I'm not sure what I can do now.

I guess you guys can close this ticket. Thanks again!  

 Incorrect results from JDBCRDD -- one record repeatly
 -

 Key: SPARK-7804
 URL: https://issues.apache.org/jira/browse/SPARK-7804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Paul Wu
  Labels: JDBCRDD, sql

 Getting only one  record repeated in the RDD and repeated field value:
  
 I have a table like:
 {code}
 attuid  name email
 12  john   j...@appp.com
 23  tom   t...@appp.com
 34  tony  t...@appp.com
 {code}
 My code:
 {code}
  JavaSparkContext sc = new JavaSparkContext(sparkConf);
 String url = ;
 java.util.Properties prop = new Properties();
 ListJDBCPartition partitionList = new ArrayList();
 //int i;
 partitionList.add(new JDBCPartition(1=1, 0));
 
 ListStructField fields = new ArrayListStructField();
 fields.add(DataTypes.createStructField(attuid, 
 DataTypes.StringType, true));
 fields.add(DataTypes.createStructField(name, DataTypes.StringType, 
 true));
 fields.add(DataTypes.createStructField(email, DataTypes.StringType, 
 true));
 StructType schema = DataTypes.createStructType(fields);
 JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(),
 JDBCRDD.getConnector(oracle.jdbc.OracleDriver, url, prop),
  
 schema,
  USERS,
 new String[]{attuid, name, email},
 new Filter[]{ },
 
 partitionList.toArray(new JDBCPartition[0])
   
 );
 
 System.out.println(count before to Java RDD= + 
 jdbcRDD.cache().count());
 JavaRDDRow jrdd = jdbcRDD.toJavaRDD();
 System.out.println(count= + jrdd.count());
 ListRow lr = jrdd.collect();
 for (Row r : lr) {
 for (int ii = 0; ii  r.length(); ii++) {
 System.out.println(r.getString(ii));
 }
 }
 {code}
 ===
 result is :
 {code}
 34
 tony
  t...@appp.com
 34
 tony
  t...@appp.com
 34
 tony 
  t...@appp.com
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7746) SetFetchSize for JDBCRDD's prepareStatement

2015-05-19 Thread Paul Wu (JIRA)
Paul Wu created SPARK-7746:
--

 Summary: SetFetchSize for JDBCRDD's prepareStatement
 Key: SPARK-7746
 URL: https://issues.apache.org/jira/browse/SPARK-7746
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 1.3.1, 1.3.0
Reporter: Paul Wu


The prepareStatement created internal to compute() method in JDBCRDD should 
have some options that the setFetchSize() can be set.  We found that the 
parameter can affect Oracle DB tremendously. With current implementation, we 
have no way to set the size.  

One of my team did his implementation, and we found 10X improvement by setting 
the size to a proper number. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-04-16 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498305#comment-14498305
 ] 

Paul Wu commented on SPARK-6936:


You are right: I used spark-hive_2.10 instead of spark-hive_2.11 in my building 
. Sorry I deleted my comment before reading your comment.  Thanks,

 SQLContext.sql() caused deadlock in multi-thread env
 

 Key: SPARK-6936
 URL: https://issues.apache.org/jira/browse/SPARK-6936
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: JDK 1.8.x, RedHat
 Linux version 2.6.32-431.23.3.el6.x86_64 
 (mockbu...@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
 Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
Reporter: Paul Wu
  Labels: deadlock, sql, threading

 Doing (the same query) in more than one threads with SQLConext.sql may lead 
 to deadlock. Here is a way to reproduce it (since this is multi-thread issue, 
 the reproduction may or may not be so easy).
 1. Register a relatively big table.
 2.  Create two different classes and in the classes, do the same query in a 
 method and put the results in a set and print out the set size.
 3.  Create two threads to use an object from each class in the run method. 
 Start the threads. For my tests,  it can have a deadlock just in a few runs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-04-16 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498247#comment-14498247
 ] 

Paul Wu commented on SPARK-6936:


Not sure about HiveContext. I tried to do the following program and I got 
exception (env: JDK 1.8/ Spark 1.3).  Why did I get the error on HiveContext?


-- exec-maven-plugin:1.2.1:exec (default-cli) @ Spark-Sample ---
Exception in thread main java.lang.NoSuchMethodError: 
scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
at org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:574)
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:245)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138)
at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138)
at 
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96)
at 
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:234)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92)
at 

[jira] [Issue Comment Deleted] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-04-16 Thread Paul Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-6936:
---
Comment: was deleted

(was: Not sure about HiveContext. I tried to do the following program and I got 
exception (env: JDK 1.8/ Spark 1.3).  Why did I get the error on HiveContext?


-- exec-maven-plugin:1.2.1:exec (default-cli) @ Spark-Sample ---
Exception in thread main java.lang.NoSuchMethodError: 
scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
at org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:574)
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:245)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138)
at org.apache.spark.sql.hive.HiveQl$$anonfun$3.apply(HiveQl.scala:138)
at 
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96)
at 
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:234)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$sql$1.apply(HiveContext.scala:92)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92)
at 

[jira] [Created] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-04-15 Thread Paul Wu (JIRA)
Paul Wu created SPARK-6936:
--

 Summary: SQLContext.sql() caused deadlock in multi-thread env
 Key: SPARK-6936
 URL: https://issues.apache.org/jira/browse/SPARK-6936
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: JDK 1.8.x, RedHat
Linux version 2.6.32-431.23.3.el6.x86_64 
(mockbu...@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014

Reporter: Paul Wu


Doing (the same query) in more than one threads with SQLConext.sql may lead to 
deadlock. Here is a way to reproduce it (since this is multi-thread issue, the 
reproduction may or may not be so easy).

1. Register a relatively big table.

2.  Create two different classes and in the classes, do the same query in a 
method and put the results in a set and print out the set size.

3.  Create two threads to use an object from each class in the run method. 
Start the threads. For my tests,  it can have a deadlock just in a few runs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-04-15 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496933#comment-14496933
 ] 

Paul Wu commented on SPARK-6936:


1. The query is something like this (sorry since the data is proprietary -- I 
cannot release it here).

SELECT distinct timezoneind, css.rnc, css.field1,localtime, greenwich  FROM cs 
, css  WHERE   cs.r = css.r and cs.name = css.name and greenwich   '2015-03-04'

2.  Maven dependency: 
 dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.11/artifactId
version1.3.0/version
scope compile/scope 
/dependency
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.11/artifactId
version1.3.0/version
scope compile/scope 
/dependency

3. Here is the trace I reproduced on Windows 7/  
C:java -version
java version 1.8.0_25
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

=
2015-04-15 13:46:13
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode):

DestroyJavaVM #98 prio=5 os_prio=0 tid=0x5dc8c800 nid=0x2048 waiting 
on condition [0x]
   java.lang.Thread.State: RUNNABLE

Thread-34 #97 prio=5 os_prio=0 tid=0x61df6000 nid=0x14b0 runnable 
[0x646ea000]
   java.lang.Thread.State: RUNNABLE
at 
scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40)
at 
scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136)
at 
scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132)
at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.get(HashMap.scala:70)
at 
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:186)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at 
scala.util.parsing.combinator.syntactical.StdTokenParsers$class.keyword(StdTokenParsers.scala:37)
at 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.keyword(StandardTokenParsers.scala:29)
at 
org.apache.spark.sql.catalyst.SqlParser$$anonfun$primary$1$$anonfun$apply$287$$anonfun$apply$288.apply(SqlParser.scala:373)
at 
org.apache.spark.sql.catalyst.SqlParser$$anonfun$primary$1$$anonfun$apply$287$$anonfun$apply$288.apply(SqlParser.scala:373)
at 
scala.util.parsing.combinator.Parsers$Parser.p$lzycompute$2(Parsers.scala:267)
- locked 0xdc33a648 (a 
scala.util.parsing.combinator.Parsers$$anon$3)
at 
scala.util.parsing.combinator.Parsers$Parser.scala$util$parsing$combinator$Parsers$Parser$$p$3(Parsers.scala:267)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$$tilde$1.apply(Parsers.scala:268)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$$tilde$1.apply(Parsers.scala:268)
at 
scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(Parsers.scala:143)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:234)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:234)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at 

[jira] [Commented] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-04-15 Thread Paul Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497267#comment-14497267
 ] 

Paul Wu commented on SPARK-6936:


Currently, I synchronized the part, which seems to be working so far.  

 SQLContext.sql() caused deadlock in multi-thread env
 

 Key: SPARK-6936
 URL: https://issues.apache.org/jira/browse/SPARK-6936
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: JDK 1.8.x, RedHat
 Linux version 2.6.32-431.23.3.el6.x86_64 
 (mockbu...@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
 Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
Reporter: Paul Wu
  Labels: deadlock, sql, threading

 Doing (the same query) in more than one threads with SQLConext.sql may lead 
 to deadlock. Here is a way to reproduce it (since this is multi-thread issue, 
 the reproduction may or may not be so easy).
 1. Register a relatively big table.
 2.  Create two different classes and in the classes, do the same query in a 
 method and put the results in a set and print out the set size.
 3.  Create two threads to use an object from each class in the run method. 
 Start the threads. For my tests,  it can have a deadlock just in a few runs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org