[jira] [Commented] (SPARK-25538) incorrect row counts after distinct()

2018-09-29 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633254#comment-16633254
 ] 

Steven Rand commented on SPARK-25538:
-

[~kiszk] I've uploaded a tarball containing parquet files that reproduce the 
issue but don't contain any of the values in the original dataset. 
Specifically, some columns have been dropped, all strings have been changed to 
"test_string", all values in col_50 have been changed to 0.0043, and the values 
in col_14 have all been mapped from their original values to values between 
0.001 and 0.0044.

This new DataFrame still reproduces issues similar to those in the description:
{code:java}
scala> df.distinct.count
res3: Long = 64

scala> df.sort("col_0").distinct.count
res4: Long = 73

scala> df.withColumnRenamed("col_0", "new").distinct.count
res5: Long = 63
{code}
I get those inconsistent/wrong results on {{2.4.0-rc2}} and if I check out 
commit {{a7c19d9c21d59fd0109a7078c80b33d3da03fafd}}, which is SPARK-23713. If I 
check out the commit immediately before, which is 
{{fe2b7a4568d65a62da6e6eb00fff05f248b4332c}}, then all three commands return 63.

cc [~cloud_fan] – IMO this should block the 2.4.0 release.

> incorrect row counts after distinct()
> -
>
> Key: SPARK-25538
> URL: https://issues.apache.org/jira/browse/SPARK-25538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Reproduced on a Centos7 VM and from source in Intellij 
> on OS X.
>Reporter: Steven Rand
>Priority: Major
>  Labels: correctness
> Attachments: SPARK-25538-repro.tgz
>
>
> It appears that {{df.distinct.count}} can return incorrect values after 
> SPARK-23713. It's possible that other operations are affected as well; 
> {{distinct}} just happens to be the one that we noticed. I believe that this 
> issue was introduced by SPARK-23713 because I can't reproduce it until that 
> commit, and I've been able to reproduce it after that commit as well as with 
> {{tags/v2.4.0-rc1}}. 
> Below are example spark-shell sessions to illustrate the problem. 
> Unfortunately the data used in these examples can't be uploaded to this Jira 
> ticket. I'll try to create test data which also reproduces the issue, and 
> will upload that if I'm able to do so.
> Example from Spark 2.3.1, which behaves correctly:
> {code}
> scala> val df = spark.read.parquet("hdfs:///data")
> df: org.apache.spark.sql.DataFrame = []
> scala> df.count
> res0: Long = 123
> scala> df.distinct.count
> res1: Long = 115
> {code}
> Example from Spark 2.4.0-rc1, which returns different output:
> {code}
> scala> val df = spark.read.parquet("hdfs:///data")
> df: org.apache.spark.sql.DataFrame = []
> scala> df.count
> res0: Long = 123
> scala> df.distinct.count
> res1: Long = 116
> scala> df.sort("col_0").distinct.count
> res2: Long = 123
> scala> df.withColumnRenamed("col_0", "newName").distinct.count
> res3: Long = 115
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25538) incorrect row counts after distinct()

2018-09-29 Thread Steven Rand (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated SPARK-25538:

Attachment: SPARK-25538-repro.tgz

> incorrect row counts after distinct()
> -
>
> Key: SPARK-25538
> URL: https://issues.apache.org/jira/browse/SPARK-25538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Reproduced on a Centos7 VM and from source in Intellij 
> on OS X.
>Reporter: Steven Rand
>Priority: Major
>  Labels: correctness
> Attachments: SPARK-25538-repro.tgz
>
>
> It appears that {{df.distinct.count}} can return incorrect values after 
> SPARK-23713. It's possible that other operations are affected as well; 
> {{distinct}} just happens to be the one that we noticed. I believe that this 
> issue was introduced by SPARK-23713 because I can't reproduce it until that 
> commit, and I've been able to reproduce it after that commit as well as with 
> {{tags/v2.4.0-rc1}}. 
> Below are example spark-shell sessions to illustrate the problem. 
> Unfortunately the data used in these examples can't be uploaded to this Jira 
> ticket. I'll try to create test data which also reproduces the issue, and 
> will upload that if I'm able to do so.
> Example from Spark 2.3.1, which behaves correctly:
> {code}
> scala> val df = spark.read.parquet("hdfs:///data")
> df: org.apache.spark.sql.DataFrame = []
> scala> df.count
> res0: Long = 123
> scala> df.distinct.count
> res1: Long = 115
> {code}
> Example from Spark 2.4.0-rc1, which returns different output:
> {code}
> scala> val df = spark.read.parquet("hdfs:///data")
> df: org.apache.spark.sql.DataFrame = []
> scala> df.count
> res0: Long = 123
> scala> df.distinct.count
> res1: Long = 116
> scala> df.sort("col_0").distinct.count
> res2: Long = 123
> scala> df.withColumnRenamed("col_0", "newName").distinct.count
> res3: Long = 115
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25501) Kafka delegation token support

2018-09-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-25501:

Labels: SPIP  (was: )

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Priority: Major
>  Labels: SPIP
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25576) Fix lint failure in 2.2

2018-09-29 Thread Xiao Li (JIRA)
Xiao Li created SPARK-25576:
---

 Summary: Fix lint failure in 2.2
 Key: SPARK-25576
 URL: https://issues.apache.org/jira/browse/SPARK-25576
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.2.2
Reporter: Xiao Li


See the errors:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.2-lint/913/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25568) Continue to update the remaining accumulators when failing to update one accumulator

2018-09-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-25568.
-
   Resolution: Fixed
Fix Version/s: 2.4.0
   2.3.3
   2.2.3

> Continue to update the remaining accumulators when failing to update one 
> accumulator
> 
>
> Key: SPARK-25568
> URL: https://issues.apache.org/jira/browse/SPARK-25568
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.2.3, 2.3.3, 2.4.0
>
>
> Currently when failing to update an accumulator, 
> DAGScheduler.updateAccumulators will skip the remaining accumulators. We 
> should try to update the remaining accumulators if possible so that they can 
> still report correct values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-29 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633145#comment-16633145
 ] 

Marcelo Vanzin commented on SPARK-25380:


We can provide ways to diminish the effect of large plans on memory usage even 
if we can't reproduce his specific case. All the things you list on your last 
e-mail do not need a reproduction; you can hack the code to generate a large 
garbage plan, and you should be able to test any of those solutions.

It would be great to know more and know whether we can make the plans more 
compact; but we should also realize that people can and do run very large and 
complicated queries that generate large plans, and we could help them with 
tuning their UI so not use so much memory.

> Generated plans occupy over 50% of Spark driver memory
> --
>
> Key: SPARK-25380
> URL: https://issues.apache.org/jira/browse/SPARK-25380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment: Spark 2.3.1 (AWS emr-5.16.0)
>  
>Reporter: Michael Spector
>Priority: Minor
> Attachments: Screen Shot 2018-09-06 at 23.19.56.png, Screen Shot 
> 2018-09-12 at 8.20.05.png, heapdump_OOM.png, image-2018-09-16-14-21-38-939.png
>
>
> When debugging an OOM exception during long run of a Spark application (many 
> iterations of the same code) I've found that generated plans occupy most of 
> the driver memory. I'm not sure whether this is a memory leak or not, but it 
> would be helpful if old plans could be purged from memory anyways.
> Attached are screenshots of OOM heap dump opened in JVisualVM.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Description: 
follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
skipping all tests

  was:
follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping 
all tests


> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633129#comment-16633129
 ] 

Felix Cheung commented on SPARK-25572:
--

[~cloud_fan] while not a blocker, it would be great to include in 2.4.0 if we 
have another RC

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633130#comment-16633130
 ] 

Felix Cheung commented on SPARK-25572:
--

[~shivaram]

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25572.
--
  Resolution: Fixed
   Fix Version/s: 2.5.0
  2.4.1
Target Version/s: 2.4.1, 2.5.0

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633096#comment-16633096
 ] 

Apache Spark commented on SPARK-25575:
--

User 'shahidki31' has created a pull request for this issue:
https://github.com/apache/spark/pull/22592

> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from 
> 2018-09-29 23-26-57.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
>  !Screenshot from 2018-09-29 23-26-45.png! 
> Open Jobs tab,
>  !Screenshot from 2018-09-29 23-26-57.png! 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25575:


Assignee: Apache Spark

> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Assignee: Apache Spark
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from 
> 2018-09-29 23-26-57.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
>  !Screenshot from 2018-09-29 23-26-45.png! 
> Open Jobs tab,
>  !Screenshot from 2018-09-29 23-26-57.png! 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25575:


Assignee: (was: Apache Spark)

> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from 
> 2018-09-29 23-26-57.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
>  !Screenshot from 2018-09-29 23-26-45.png! 
> Open Jobs tab,
>  !Screenshot from 2018-09-29 23-26-57.png! 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25575:
---
Attachment: Screenshot from 2018-09-29 23-26-57.png

> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from 
> 2018-09-29 23-26-57.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
> !image-2018-09-29-23-28-56-045.png!
> Open Jobs tab,
> !image-2018-09-29-23-28-39-693.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25575:
---
Description: 
Test tests:
 1) bin/spark-shell
{code:java}
sql("create table a (id int)")
for(i <- 1 to 100) sql(s"insert into a values ($i)")
{code}
Open SQL tab in the web UI,

 !Screenshot from 2018-09-29 23-26-45.png! 

Open Jobs tab,

 !Screenshot from 2018-09-29 23-26-57.png! 

 

  was:
Test tests:
 1) bin/spark-shell
{code:java}
sql("create table a (id int)")
for(i <- 1 to 100) sql(s"insert into a values ($i)")
{code}
Open SQL tab in the web UI,

!image-2018-09-29-23-28-56-045.png!

Open Jobs tab,

!image-2018-09-29-23-28-39-693.png!

 


> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from 
> 2018-09-29 23-26-57.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
>  !Screenshot from 2018-09-29 23-26-45.png! 
> Open Jobs tab,
>  !Screenshot from 2018-09-29 23-26-57.png! 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25575:
---
Attachment: Screenshot from 2018-09-29 23-26-45.png

> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
> !image-2018-09-29-23-28-56-045.png!
> Open Jobs tab,
> !image-2018-09-29-23-28-39-693.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread shahid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633082#comment-16633082
 ] 

shahid commented on SPARK-25575:


I will raise a PR

> SQL tab in the spark UI doesn't have option of  hiding tables, eventhough 
> other UI tabs has. 
> -
>
> Key: SPARK-25575
> URL: https://issues.apache.org/jira/browse/SPARK-25575
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Minor
> Attachments: Screenshot from 2018-09-29 23-26-45.png
>
>
> Test tests:
>  1) bin/spark-shell
> {code:java}
> sql("create table a (id int)")
> for(i <- 1 to 100) sql(s"insert into a values ($i)")
> {code}
> Open SQL tab in the web UI,
> !image-2018-09-29-23-28-56-045.png!
> Open Jobs tab,
> !image-2018-09-29-23-28-39-693.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.

2018-09-29 Thread shahid (JIRA)
shahid created SPARK-25575:
--

 Summary: SQL tab in the spark UI doesn't have option of  hiding 
tables, eventhough other UI tabs has. 
 Key: SPARK-25575
 URL: https://issues.apache.org/jira/browse/SPARK-25575
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.3.1
Reporter: shahid
 Attachments: Screenshot from 2018-09-29 23-26-45.png

Test tests:
 1) bin/spark-shell
{code:java}
sql("create table a (id int)")
for(i <- 1 to 100) sql(s"insert into a values ($i)")
{code}
Open SQL tab in the web UI,

!image-2018-09-29-23-28-56-045.png!

Open Jobs tab,

!image-2018-09-29-23-28-39-693.png!

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25571.
--
Resolution: Duplicate

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25508) Refactor OrcReadBenchmark to use main method

2018-09-29 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25508.
---
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22580
[https://github.com/apache/spark/pull/22580]

> Refactor OrcReadBenchmark to use main method
> 
>
> Key: SPARK-25508
> URL: https://issues.apache.org/jira/browse/SPARK-25508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25508) Refactor OrcReadBenchmark to use main method

2018-09-29 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25508:
-

Assignee: yucai

> Refactor OrcReadBenchmark to use main method
> 
>
> Key: SPARK-25508
> URL: https://issues.apache.org/jira/browse/SPARK-25508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-25571:

External issue URL:   (was: https://github.com/apache/spark/pull/)
 External issue ID:   (was: 22591)

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633025#comment-16633025
 ] 

Apache Spark commented on SPARK-25571:
--

User 'cryeo' has created a pull request for this issue:
https://github.com/apache/spark/pull/22591

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633024#comment-16633024
 ] 

Apache Spark commented on SPARK-25571:
--

User 'cryeo' has created a pull request for this issue:
https://github.com/apache/spark/pull/22591

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25571:


Assignee: Apache Spark

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Assignee: Apache Spark
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25571:


Assignee: (was: Apache Spark)

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-25571:

External issue URL: https://github.com/apache/spark/pull/
 External issue ID: 22591

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25048) Pivoting by multiple columns in Scala/Java

2018-09-29 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25048.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22316
[https://github.com/apache/spark/pull/22316]

> Pivoting by multiple columns in Scala/Java
> --
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 2.5.0
>
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25048) Pivoting by multiple columns in Scala/Java

2018-09-29 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-25048:


Assignee: Maxim Gekk

> Pivoting by multiple columns in Scala/Java
> --
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 2.5.0
>
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25447) Support JSON options by schema_of_json

2018-09-29 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25447.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22442
[https://github.com/apache/spark/pull/22442]

> Support JSON options by schema_of_json
> --
>
> Key: SPARK-25447
> URL: https://issues.apache.org/jira/browse/SPARK-25447
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 2.5.0
>
>
> The function schema_of_json doesn't accept any options currently but the 
> options can impact on schema inferring. Need to support the same options that 
> from_json() can use on schema inferring. Here is examples of options that 
> could impact on schema inferring:
> * primitivesAsString
> * prefersDecimal
> * allowComments
> * allowUnquotedFieldNames
> * allowSingleQuotes
> * allowNumericLeadingZeros
> * allowNonNumericNumbers
> * allowBackslashEscapingAnyCharacter
> * allowUnquotedControlChars
> Below is possible signature:
> {code:scala}
> def schema_of_json(e: Column, options: java.util.Map[String, String]): Column
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25447) Support JSON options by schema_of_json

2018-09-29 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-25447:


Assignee: Maxim Gekk

> Support JSON options by schema_of_json
> --
>
> Key: SPARK-25447
> URL: https://issues.apache.org/jira/browse/SPARK-25447
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> The function schema_of_json doesn't accept any options currently but the 
> options can impact on schema inferring. Need to support the same options that 
> from_json() can use on schema inferring. Here is examples of options that 
> could impact on schema inferring:
> * primitivesAsString
> * prefersDecimal
> * allowComments
> * allowUnquotedFieldNames
> * allowSingleQuotes
> * allowNumericLeadingZeros
> * allowNonNumericNumbers
> * allowBackslashEscapingAnyCharacter
> * allowUnquotedControlChars
> Below is possible signature:
> {code:scala}
> def schema_of_json(e: Column, options: java.util.Map[String, String]): Column
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632873#comment-16632873
 ] 

Apache Spark commented on SPARK-25574:
--

User '10110346' has created a pull request for this issue:
https://github.com/apache/spark/pull/22590

> Add an option `keepQuotes` for parsing csv  file
> 
>
> Key: SPARK-25574
> URL: https://issues.apache.org/jira/browse/SPARK-25574
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> In our project, when we read the CSV file, we hope to keep quotes.
> For example:
> We have such a record in the CSV file.:
> *ab,cc,,"c,ddd"*
> We hope it displays like this:
> |_c0|_c1|_c2|    _c3|
> |  ab|cc  |null|*"c,ddd"*|
>  
> Not like this:
> |_c0|_c1|_c2|  _c3|
> |  ab|cc  |null|c,ddd|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25574:


Assignee: Apache Spark

> Add an option `keepQuotes` for parsing csv  file
> 
>
> Key: SPARK-25574
> URL: https://issues.apache.org/jira/browse/SPARK-25574
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: liuxian
>Assignee: Apache Spark
>Priority: Minor
>
> In our project, when we read the CSV file, we hope to keep quotes.
> For example:
> We have such a record in the CSV file.:
> *ab,cc,,"c,ddd"*
> We hope it displays like this:
> |_c0|_c1|_c2|    _c3|
> |  ab|cc  |null|*"c,ddd"*|
>  
> Not like this:
> |_c0|_c1|_c2|  _c3|
> |  ab|cc  |null|c,ddd|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25574:


Assignee: (was: Apache Spark)

> Add an option `keepQuotes` for parsing csv  file
> 
>
> Key: SPARK-25574
> URL: https://issues.apache.org/jira/browse/SPARK-25574
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> In our project, when we read the CSV file, we hope to keep quotes.
> For example:
> We have such a record in the CSV file.:
> *ab,cc,,"c,ddd"*
> We hope it displays like this:
> |_c0|_c1|_c2|    _c3|
> |  ab|cc  |null|*"c,ddd"*|
>  
> Not like this:
> |_c0|_c1|_c2|  _c3|
> |  ab|cc  |null|c,ddd|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632874#comment-16632874
 ] 

Apache Spark commented on SPARK-25574:
--

User '10110346' has created a pull request for this issue:
https://github.com/apache/spark/pull/22590

> Add an option `keepQuotes` for parsing csv  file
> 
>
> Key: SPARK-25574
> URL: https://issues.apache.org/jira/browse/SPARK-25574
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> In our project, when we read the CSV file, we hope to keep quotes.
> For example:
> We have such a record in the CSV file.:
> *ab,cc,,"c,ddd"*
> We hope it displays like this:
> |_c0|_c1|_c2|    _c3|
> |  ab|cc  |null|*"c,ddd"*|
>  
> Not like this:
> |_c0|_c1|_c2|  _c3|
> |  ab|cc  |null|c,ddd|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread liuxian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuxian updated SPARK-25574:

Description: 
In our project, when we read the CSV file, we hope to keep quotes.

For example:

We have such a record in the CSV file.:

*ab,cc,,"c,ddd"*

We hope it displays like this:
|_c0|_c1|_c2|    _c3|
|  ab|cc   |null|*"c,ddd"*|

 

Not like this:
|_c0|_c1|_c2|  _c3|
|  ab|cc   |null |c,ddd|

+-+--++-+

  was:
In our project, when we read the CSV file, we hope to keep quotes.

For example:

We have such a record in the CSV file.:

*ab,cc,,"c,ddd"*

We hope it displays like this:

++---++---+
| _c0|_c1| _c2|    _c3|
++---++---+
|  ab| cc|null|*"c,ddd"*|

 

not like this:

++---++-+
| _c0|_c1| _c2|  _c3|
++---++-+
|  ab| cc|null|c,ddd|
++---++-+


> Add an option `keepQuotes` for parsing csv  file
> 
>
> Key: SPARK-25574
> URL: https://issues.apache.org/jira/browse/SPARK-25574
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> In our project, when we read the CSV file, we hope to keep quotes.
> For example:
> We have such a record in the CSV file.:
> *ab,cc,,"c,ddd"*
> We hope it displays like this:
> |_c0|_c1|_c2|    _c3|
> |  ab|cc   |null|*"c,ddd"*|
>  
> Not like this:
> |_c0|_c1|_c2|  _c3|
> |  ab|cc   |null |c,ddd|
> +-+--++-+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread liuxian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuxian updated SPARK-25574:

Description: 
In our project, when we read the CSV file, we hope to keep quotes.

For example:

We have such a record in the CSV file.:

*ab,cc,,"c,ddd"*

We hope it displays like this:
|_c0|_c1|_c2|    _c3|
|  ab|cc  |null|*"c,ddd"*|

 

Not like this:
|_c0|_c1|_c2|  _c3|
|  ab|cc  |null|c,ddd|

  was:
In our project, when we read the CSV file, we hope to keep quotes.

For example:

We have such a record in the CSV file.:

*ab,cc,,"c,ddd"*

We hope it displays like this:
|_c0|_c1|_c2|    _c3|
|  ab|cc   |null|*"c,ddd"*|

 

Not like this:
|_c0|_c1|_c2|  _c3|
|  ab|cc   |null |c,ddd|

+-+--++-+


> Add an option `keepQuotes` for parsing csv  file
> 
>
> Key: SPARK-25574
> URL: https://issues.apache.org/jira/browse/SPARK-25574
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> In our project, when we read the CSV file, we hope to keep quotes.
> For example:
> We have such a record in the CSV file.:
> *ab,cc,,"c,ddd"*
> We hope it displays like this:
> |_c0|_c1|_c2|    _c3|
> |  ab|cc  |null|*"c,ddd"*|
>  
> Not like this:
> |_c0|_c1|_c2|  _c3|
> |  ab|cc  |null|c,ddd|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread liuxian (JIRA)
liuxian created SPARK-25574:
---

 Summary: Add an option `keepQuotes` for parsing csv  file
 Key: SPARK-25574
 URL: https://issues.apache.org/jira/browse/SPARK-25574
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: liuxian


In our project, when we read the CSV file, we hope to keep quotes.

For example:

We have such a record in the CSV file.:

*ab,cc,,"c,ddd"*

We hope it displays like this:

++---++---+
| _c0|_c1| _c2|    _c3|
++---++---+
|  ab| cc|null|*"c,ddd"*|

 

not like this:

++---++-+
| _c0|_c1| _c2|  _c3|
++---++-+
|  ab| cc|null|c,ddd|
++---++-+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25573) Combine resolveExpression and resolve in the rule ResolveReferences

2018-09-29 Thread Xiao Li (JIRA)
Xiao Li created SPARK-25573:
---

 Summary: Combine resolveExpression and resolve in the rule 
ResolveReferences
 Key: SPARK-25573
 URL: https://issues.apache.org/jira/browse/SPARK-25573
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Xiao Li


In the rule ResolveReferences, two private functions `resolve` and 
`resolveExpression` should be combined. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org