[jira] [Commented] (SPARK-25538) incorrect row counts after distinct()
[ https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633254#comment-16633254 ] Steven Rand commented on SPARK-25538: - [~kiszk] I've uploaded a tarball containing parquet files that reproduce the issue but don't contain any of the values in the original dataset. Specifically, some columns have been dropped, all strings have been changed to "test_string", all values in col_50 have been changed to 0.0043, and the values in col_14 have all been mapped from their original values to values between 0.001 and 0.0044. This new DataFrame still reproduces issues similar to those in the description: {code:java} scala> df.distinct.count res3: Long = 64 scala> df.sort("col_0").distinct.count res4: Long = 73 scala> df.withColumnRenamed("col_0", "new").distinct.count res5: Long = 63 {code} I get those inconsistent/wrong results on {{2.4.0-rc2}} and if I check out commit {{a7c19d9c21d59fd0109a7078c80b33d3da03fafd}}, which is SPARK-23713. If I check out the commit immediately before, which is {{fe2b7a4568d65a62da6e6eb00fff05f248b4332c}}, then all three commands return 63. cc [~cloud_fan] – IMO this should block the 2.4.0 release. > incorrect row counts after distinct() > - > > Key: SPARK-25538 > URL: https://issues.apache.org/jira/browse/SPARK-25538 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 > Environment: Reproduced on a Centos7 VM and from source in Intellij > on OS X. >Reporter: Steven Rand >Priority: Major > Labels: correctness > Attachments: SPARK-25538-repro.tgz > > > It appears that {{df.distinct.count}} can return incorrect values after > SPARK-23713. It's possible that other operations are affected as well; > {{distinct}} just happens to be the one that we noticed. I believe that this > issue was introduced by SPARK-23713 because I can't reproduce it until that > commit, and I've been able to reproduce it after that commit as well as with > {{tags/v2.4.0-rc1}}. > Below are example spark-shell sessions to illustrate the problem. > Unfortunately the data used in these examples can't be uploaded to this Jira > ticket. I'll try to create test data which also reproduces the issue, and > will upload that if I'm able to do so. > Example from Spark 2.3.1, which behaves correctly: > {code} > scala> val df = spark.read.parquet("hdfs:///data") > df: org.apache.spark.sql.DataFrame = [] > scala> df.count > res0: Long = 123 > scala> df.distinct.count > res1: Long = 115 > {code} > Example from Spark 2.4.0-rc1, which returns different output: > {code} > scala> val df = spark.read.parquet("hdfs:///data") > df: org.apache.spark.sql.DataFrame = [] > scala> df.count > res0: Long = 123 > scala> df.distinct.count > res1: Long = 116 > scala> df.sort("col_0").distinct.count > res2: Long = 123 > scala> df.withColumnRenamed("col_0", "newName").distinct.count > res3: Long = 115 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25538) incorrect row counts after distinct()
[ https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated SPARK-25538: Attachment: SPARK-25538-repro.tgz > incorrect row counts after distinct() > - > > Key: SPARK-25538 > URL: https://issues.apache.org/jira/browse/SPARK-25538 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 > Environment: Reproduced on a Centos7 VM and from source in Intellij > on OS X. >Reporter: Steven Rand >Priority: Major > Labels: correctness > Attachments: SPARK-25538-repro.tgz > > > It appears that {{df.distinct.count}} can return incorrect values after > SPARK-23713. It's possible that other operations are affected as well; > {{distinct}} just happens to be the one that we noticed. I believe that this > issue was introduced by SPARK-23713 because I can't reproduce it until that > commit, and I've been able to reproduce it after that commit as well as with > {{tags/v2.4.0-rc1}}. > Below are example spark-shell sessions to illustrate the problem. > Unfortunately the data used in these examples can't be uploaded to this Jira > ticket. I'll try to create test data which also reproduces the issue, and > will upload that if I'm able to do so. > Example from Spark 2.3.1, which behaves correctly: > {code} > scala> val df = spark.read.parquet("hdfs:///data") > df: org.apache.spark.sql.DataFrame = [] > scala> df.count > res0: Long = 123 > scala> df.distinct.count > res1: Long = 115 > {code} > Example from Spark 2.4.0-rc1, which returns different output: > {code} > scala> val df = spark.read.parquet("hdfs:///data") > df: org.apache.spark.sql.DataFrame = [] > scala> df.count > res0: Long = 123 > scala> df.distinct.count > res1: Long = 116 > scala> df.sort("col_0").distinct.count > res2: Long = 123 > scala> df.withColumnRenamed("col_0", "newName").distinct.count > res3: Long = 115 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25501) Kafka delegation token support
[ https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-25501: Labels: SPIP (was: ) > Kafka delegation token support > -- > > Key: SPARK-25501 > URL: https://issues.apache.org/jira/browse/SPARK-25501 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Gabor Somogyi >Priority: Major > Labels: SPIP > > In kafka version 1.1 delegation token support is released. As spark updated > it's kafka client to 2.0.0 now it's possible to implement delegation token > support. Please see description: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25576) Fix lint failure in 2.2
Xiao Li created SPARK-25576: --- Summary: Fix lint failure in 2.2 Key: SPARK-25576 URL: https://issues.apache.org/jira/browse/SPARK-25576 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.2.2 Reporter: Xiao Li See the errors: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.2-lint/913/console -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25568) Continue to update the remaining accumulators when failing to update one accumulator
[ https://issues.apache.org/jira/browse/SPARK-25568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-25568. - Resolution: Fixed Fix Version/s: 2.4.0 2.3.3 2.2.3 > Continue to update the remaining accumulators when failing to update one > accumulator > > > Key: SPARK-25568 > URL: https://issues.apache.org/jira/browse/SPARK-25568 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.2.3, 2.3.3, 2.4.0 > > > Currently when failing to update an accumulator, > DAGScheduler.updateAccumulators will skip the remaining accumulators. We > should try to update the remaining accumulators if possible so that they can > still report correct values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633145#comment-16633145 ] Marcelo Vanzin commented on SPARK-25380: We can provide ways to diminish the effect of large plans on memory usage even if we can't reproduce his specific case. All the things you list on your last e-mail do not need a reproduction; you can hack the code to generate a large garbage plan, and you should be able to test any of those solutions. It would be great to know more and know whether we can make the plans more compact; but we should also realize that people can and do run very large and complicated queries that generate large plans, and we could help them with tuning their UI so not use so much memory. > Generated plans occupy over 50% of Spark driver memory > -- > > Key: SPARK-25380 > URL: https://issues.apache.org/jira/browse/SPARK-25380 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 > Environment: Spark 2.3.1 (AWS emr-5.16.0) > >Reporter: Michael Spector >Priority: Minor > Attachments: Screen Shot 2018-09-06 at 23.19.56.png, Screen Shot > 2018-09-12 at 8.20.05.png, heapdump_OOM.png, image-2018-09-16-14-21-38-939.png > > > When debugging an OOM exception during long run of a Spark application (many > iterations of the same code) I've found that generated plans occupy most of > the driver memory. I'm not sure whether this is a memory leak or not, but it > would be helpful if old plans could be purged from memory anyways. > Attached are screenshots of OOM heap dump opened in JVisualVM. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10
[ https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-25572: - Description: follow up to SPARK-24255 from 2.3.2 release we can see that CRAN doesn't seem to respect the system requirements as running tests - we have seen cases where SparkR is run on Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt skipping all tests was: follow up to SPARK-24255 from 2.3.2 release we can see that CRAN doesn't seem to respect the system requirements as running tests - we have seen cases where SparkR is run on Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping all tests > SparkR tests failed on CRAN on Java 10 > -- > > Key: SPARK-25572 > URL: https://issues.apache.org/jira/browse/SPARK-25572 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 2.4.1, 2.5.0 > > > follow up to SPARK-24255 > from 2.3.2 release we can see that CRAN doesn't seem to respect the system > requirements as running tests - we have seen cases where SparkR is run on > Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt > skipping all tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10
[ https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633129#comment-16633129 ] Felix Cheung commented on SPARK-25572: -- [~cloud_fan] while not a blocker, it would be great to include in 2.4.0 if we have another RC > SparkR tests failed on CRAN on Java 10 > -- > > Key: SPARK-25572 > URL: https://issues.apache.org/jira/browse/SPARK-25572 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 2.4.1, 2.5.0 > > > follow up to SPARK-24255 > from 2.3.2 release we can see that CRAN doesn't seem to respect the system > requirements as running tests - we have seen cases where SparkR is run on > Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt > skipping all tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10
[ https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633130#comment-16633130 ] Felix Cheung commented on SPARK-25572: -- [~shivaram] > SparkR tests failed on CRAN on Java 10 > -- > > Key: SPARK-25572 > URL: https://issues.apache.org/jira/browse/SPARK-25572 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 2.4.1, 2.5.0 > > > follow up to SPARK-24255 > from 2.3.2 release we can see that CRAN doesn't seem to respect the system > requirements as running tests - we have seen cases where SparkR is run on > Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt > skipping all tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25572) SparkR tests failed on CRAN on Java 10
[ https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-25572. -- Resolution: Fixed Fix Version/s: 2.5.0 2.4.1 Target Version/s: 2.4.1, 2.5.0 > SparkR tests failed on CRAN on Java 10 > -- > > Key: SPARK-25572 > URL: https://issues.apache.org/jira/browse/SPARK-25572 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 2.4.1, 2.5.0 > > > follow up to SPARK-24255 > from 2.3.2 release we can see that CRAN doesn't seem to respect the system > requirements as running tests - we have seen cases where SparkR is run on > Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt > skipping all tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633096#comment-16633096 ] Apache Spark commented on SPARK-25575: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/22592 > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from > 2018-09-29 23-26-57.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !Screenshot from 2018-09-29 23-26-45.png! > Open Jobs tab, > !Screenshot from 2018-09-29 23-26-57.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25575: Assignee: Apache Spark > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Assignee: Apache Spark >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from > 2018-09-29 23-26-57.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !Screenshot from 2018-09-29 23-26-45.png! > Open Jobs tab, > !Screenshot from 2018-09-29 23-26-57.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25575: Assignee: (was: Apache Spark) > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from > 2018-09-29 23-26-57.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !Screenshot from 2018-09-29 23-26-45.png! > Open Jobs tab, > !Screenshot from 2018-09-29 23-26-57.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-25575: --- Attachment: Screenshot from 2018-09-29 23-26-57.png > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from > 2018-09-29 23-26-57.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !image-2018-09-29-23-28-56-045.png! > Open Jobs tab, > !image-2018-09-29-23-28-39-693.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-25575: --- Description: Test tests: 1) bin/spark-shell {code:java} sql("create table a (id int)") for(i <- 1 to 100) sql(s"insert into a values ($i)") {code} Open SQL tab in the web UI, !Screenshot from 2018-09-29 23-26-45.png! Open Jobs tab, !Screenshot from 2018-09-29 23-26-57.png! was: Test tests: 1) bin/spark-shell {code:java} sql("create table a (id int)") for(i <- 1 to 100) sql(s"insert into a values ($i)") {code} Open SQL tab in the web UI, !image-2018-09-29-23-28-56-045.png! Open Jobs tab, !image-2018-09-29-23-28-39-693.png! > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png, Screenshot from > 2018-09-29 23-26-57.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !Screenshot from 2018-09-29 23-26-45.png! > Open Jobs tab, > !Screenshot from 2018-09-29 23-26-57.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-25575: --- Attachment: Screenshot from 2018-09-29 23-26-45.png > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !image-2018-09-29-23-28-56-045.png! > Open Jobs tab, > !image-2018-09-29-23-28-39-693.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
[ https://issues.apache.org/jira/browse/SPARK-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633082#comment-16633082 ] shahid commented on SPARK-25575: I will raise a PR > SQL tab in the spark UI doesn't have option of hiding tables, eventhough > other UI tabs has. > - > > Key: SPARK-25575 > URL: https://issues.apache.org/jira/browse/SPARK-25575 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Minor > Attachments: Screenshot from 2018-09-29 23-26-45.png > > > Test tests: > 1) bin/spark-shell > {code:java} > sql("create table a (id int)") > for(i <- 1 to 100) sql(s"insert into a values ($i)") > {code} > Open SQL tab in the web UI, > !image-2018-09-29-23-28-56-045.png! > Open Jobs tab, > !image-2018-09-29-23-28-39-693.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25575) SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has.
shahid created SPARK-25575: -- Summary: SQL tab in the spark UI doesn't have option of hiding tables, eventhough other UI tabs has. Key: SPARK-25575 URL: https://issues.apache.org/jira/browse/SPARK-25575 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.3.1 Reporter: shahid Attachments: Screenshot from 2018-09-29 23-26-45.png Test tests: 1) bin/spark-shell {code:java} sql("create table a (id int)") for(i <- 1 to 100) sql(s"insert into a values ($i)") {code} Open SQL tab in the web UI, !image-2018-09-29-23-28-56-045.png! Open Jobs tab, !image-2018-09-29-23-28-39-693.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25571. -- Resolution: Duplicate > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25508) Refactor OrcReadBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-25508. --- Resolution: Fixed Fix Version/s: 2.5.0 Issue resolved by pull request 22580 [https://github.com/apache/spark/pull/22580] > Refactor OrcReadBenchmark to use main method > > > Key: SPARK-25508 > URL: https://issues.apache.org/jira/browse/SPARK-25508 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.5.0 >Reporter: yucai >Assignee: yucai >Priority: Major > Fix For: 2.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25508) Refactor OrcReadBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-25508: - Assignee: yucai > Refactor OrcReadBenchmark to use main method > > > Key: SPARK-25508 > URL: https://issues.apache.org/jira/browse/SPARK-25508 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.5.0 >Reporter: yucai >Assignee: yucai >Priority: Major > Fix For: 2.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-25571: External issue URL: (was: https://github.com/apache/spark/pull/) External issue ID: (was: 22591) > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633025#comment-16633025 ] Apache Spark commented on SPARK-25571: -- User 'cryeo' has created a pull request for this issue: https://github.com/apache/spark/pull/22591 > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633024#comment-16633024 ] Apache Spark commented on SPARK-25571: -- User 'cryeo' has created a pull request for this issue: https://github.com/apache/spark/pull/22591 > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25571: Assignee: Apache Spark > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Assignee: Apache Spark >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25571: Assignee: (was: Apache Spark) > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-25571: External issue URL: https://github.com/apache/spark/pull/ External issue ID: 22591 > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25048) Pivoting by multiple columns in Scala/Java
[ https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25048. -- Resolution: Fixed Fix Version/s: 2.5.0 Issue resolved by pull request 22316 [https://github.com/apache/spark/pull/22316] > Pivoting by multiple columns in Scala/Java > -- > > Key: SPARK-25048 > URL: https://issues.apache.org/jira/browse/SPARK-25048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 2.5.0 > > > Need to change or extend existing API to make pivoting by multiple columns > possible. Users should be able to use many columns and values like in the > example: > {code:scala} > trainingSales > .groupBy($"sales.year") > .pivot(struct(lower($"sales.course"), $"training"), Seq( > struct(lit("dotnet"), lit("Experts")), > struct(lit("java"), lit("Dummies"))) > ).agg(sum($"sales.earnings")) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25048) Pivoting by multiple columns in Scala/Java
[ https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-25048: Assignee: Maxim Gekk > Pivoting by multiple columns in Scala/Java > -- > > Key: SPARK-25048 > URL: https://issues.apache.org/jira/browse/SPARK-25048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 2.5.0 > > > Need to change or extend existing API to make pivoting by multiple columns > possible. Users should be able to use many columns and values like in the > example: > {code:scala} > trainingSales > .groupBy($"sales.year") > .pivot(struct(lower($"sales.course"), $"training"), Seq( > struct(lit("dotnet"), lit("Experts")), > struct(lit("java"), lit("Dummies"))) > ).agg(sum($"sales.earnings")) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25447) Support JSON options by schema_of_json
[ https://issues.apache.org/jira/browse/SPARK-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25447. -- Resolution: Fixed Fix Version/s: 2.5.0 Issue resolved by pull request 22442 [https://github.com/apache/spark/pull/22442] > Support JSON options by schema_of_json > -- > > Key: SPARK-25447 > URL: https://issues.apache.org/jira/browse/SPARK-25447 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 2.5.0 > > > The function schema_of_json doesn't accept any options currently but the > options can impact on schema inferring. Need to support the same options that > from_json() can use on schema inferring. Here is examples of options that > could impact on schema inferring: > * primitivesAsString > * prefersDecimal > * allowComments > * allowUnquotedFieldNames > * allowSingleQuotes > * allowNumericLeadingZeros > * allowNonNumericNumbers > * allowBackslashEscapingAnyCharacter > * allowUnquotedControlChars > Below is possible signature: > {code:scala} > def schema_of_json(e: Column, options: java.util.Map[String, String]): Column > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25447) Support JSON options by schema_of_json
[ https://issues.apache.org/jira/browse/SPARK-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-25447: Assignee: Maxim Gekk > Support JSON options by schema_of_json > -- > > Key: SPARK-25447 > URL: https://issues.apache.org/jira/browse/SPARK-25447 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > > The function schema_of_json doesn't accept any options currently but the > options can impact on schema inferring. Need to support the same options that > from_json() can use on schema inferring. Here is examples of options that > could impact on schema inferring: > * primitivesAsString > * prefersDecimal > * allowComments > * allowUnquotedFieldNames > * allowSingleQuotes > * allowNumericLeadingZeros > * allowNonNumericNumbers > * allowBackslashEscapingAnyCharacter > * allowUnquotedControlChars > Below is possible signature: > {code:scala} > def schema_of_json(e: Column, options: java.util.Map[String, String]): Column > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632873#comment-16632873 ] Apache Spark commented on SPARK-25574: -- User '10110346' has created a pull request for this issue: https://github.com/apache/spark/pull/22590 > Add an option `keepQuotes` for parsing csv file > > > Key: SPARK-25574 > URL: https://issues.apache.org/jira/browse/SPARK-25574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > In our project, when we read the CSV file, we hope to keep quotes. > For example: > We have such a record in the CSV file.: > *ab,cc,,"c,ddd"* > We hope it displays like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|*"c,ddd"*| > > Not like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|c,ddd| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25574: Assignee: Apache Spark > Add an option `keepQuotes` for parsing csv file > > > Key: SPARK-25574 > URL: https://issues.apache.org/jira/browse/SPARK-25574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: liuxian >Assignee: Apache Spark >Priority: Minor > > In our project, when we read the CSV file, we hope to keep quotes. > For example: > We have such a record in the CSV file.: > *ab,cc,,"c,ddd"* > We hope it displays like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|*"c,ddd"*| > > Not like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|c,ddd| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25574: Assignee: (was: Apache Spark) > Add an option `keepQuotes` for parsing csv file > > > Key: SPARK-25574 > URL: https://issues.apache.org/jira/browse/SPARK-25574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > In our project, when we read the CSV file, we hope to keep quotes. > For example: > We have such a record in the CSV file.: > *ab,cc,,"c,ddd"* > We hope it displays like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|*"c,ddd"*| > > Not like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|c,ddd| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632874#comment-16632874 ] Apache Spark commented on SPARK-25574: -- User '10110346' has created a pull request for this issue: https://github.com/apache/spark/pull/22590 > Add an option `keepQuotes` for parsing csv file > > > Key: SPARK-25574 > URL: https://issues.apache.org/jira/browse/SPARK-25574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > In our project, when we read the CSV file, we hope to keep quotes. > For example: > We have such a record in the CSV file.: > *ab,cc,,"c,ddd"* > We hope it displays like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|*"c,ddd"*| > > Not like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|c,ddd| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25574: Description: In our project, when we read the CSV file, we hope to keep quotes. For example: We have such a record in the CSV file.: *ab,cc,,"c,ddd"* We hope it displays like this: |_c0|_c1|_c2| _c3| | ab|cc |null|*"c,ddd"*| Not like this: |_c0|_c1|_c2| _c3| | ab|cc |null |c,ddd| +-+--++-+ was: In our project, when we read the CSV file, we hope to keep quotes. For example: We have such a record in the CSV file.: *ab,cc,,"c,ddd"* We hope it displays like this: ++---++---+ | _c0|_c1| _c2| _c3| ++---++---+ | ab| cc|null|*"c,ddd"*| not like this: ++---++-+ | _c0|_c1| _c2| _c3| ++---++-+ | ab| cc|null|c,ddd| ++---++-+ > Add an option `keepQuotes` for parsing csv file > > > Key: SPARK-25574 > URL: https://issues.apache.org/jira/browse/SPARK-25574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > In our project, when we read the CSV file, we hope to keep quotes. > For example: > We have such a record in the CSV file.: > *ab,cc,,"c,ddd"* > We hope it displays like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|*"c,ddd"*| > > Not like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null |c,ddd| > +-+--++-+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25574: Description: In our project, when we read the CSV file, we hope to keep quotes. For example: We have such a record in the CSV file.: *ab,cc,,"c,ddd"* We hope it displays like this: |_c0|_c1|_c2| _c3| | ab|cc |null|*"c,ddd"*| Not like this: |_c0|_c1|_c2| _c3| | ab|cc |null|c,ddd| was: In our project, when we read the CSV file, we hope to keep quotes. For example: We have such a record in the CSV file.: *ab,cc,,"c,ddd"* We hope it displays like this: |_c0|_c1|_c2| _c3| | ab|cc |null|*"c,ddd"*| Not like this: |_c0|_c1|_c2| _c3| | ab|cc |null |c,ddd| +-+--++-+ > Add an option `keepQuotes` for parsing csv file > > > Key: SPARK-25574 > URL: https://issues.apache.org/jira/browse/SPARK-25574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: liuxian >Priority: Minor > > In our project, when we read the CSV file, we hope to keep quotes. > For example: > We have such a record in the CSV file.: > *ab,cc,,"c,ddd"* > We hope it displays like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|*"c,ddd"*| > > Not like this: > |_c0|_c1|_c2| _c3| > | ab|cc |null|c,ddd| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25574) Add an option `keepQuotes` for parsing csv file
liuxian created SPARK-25574: --- Summary: Add an option `keepQuotes` for parsing csv file Key: SPARK-25574 URL: https://issues.apache.org/jira/browse/SPARK-25574 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: liuxian In our project, when we read the CSV file, we hope to keep quotes. For example: We have such a record in the CSV file.: *ab,cc,,"c,ddd"* We hope it displays like this: ++---++---+ | _c0|_c1| _c2| _c3| ++---++---+ | ab| cc|null|*"c,ddd"*| not like this: ++---++-+ | _c0|_c1| _c2| _c3| ++---++-+ | ab| cc|null|c,ddd| ++---++-+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25573) Combine resolveExpression and resolve in the rule ResolveReferences
Xiao Li created SPARK-25573: --- Summary: Combine resolveExpression and resolve in the rule ResolveReferences Key: SPARK-25573 URL: https://issues.apache.org/jira/browse/SPARK-25573 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Xiao Li In the rule ResolveReferences, two private functions `resolve` and `resolveExpression` should be combined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org