[jira] [Updated] (SPARK-45056) Add process termination tests for Python foreachBatch and StreamingQueryListener
[ https://issues.apache.org/jira/browse/SPARK-45056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45056: --- Labels: pull-request-available (was: ) > Add process termination tests for Python foreachBatch and > StreamingQueryListener > > > Key: SPARK-45056 > URL: https://issues.apache.org/jira/browse/SPARK-45056 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45126) Multi-tenant history server
Ramu Ramaiah created SPARK-45126: Summary: Multi-tenant history server Key: SPARK-45126 URL: https://issues.apache.org/jira/browse/SPARK-45126 Project: Spark Issue Type: Wish Components: Spark Core Affects Versions: 3.4.1 Reporter: Ramu Ramaiah Spark history server makes use of the configuration "spark.history.fs.logDirectory" to locate the log events. This works well for a single tenant. When it is used for a multi-tenant deployment, the log events of multiple tenants are stored in a single directory which does not provide a logical separation of events for each tenant. The proposal/wish is to have a support for Multi-tenant history server, where-in the configuration "spark.history.fs.logDirectory" can be a base directory. The sub-directories can contain the log events for each tenant. The sub-directories can be named after each tenant, for e.g. "tenant1", "tenant2" etc. When it is combined to work with Spark Driver/Executor which makes use of the property "spark.eventLog.dir", the value of this property can be appropriately set for each tenant. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45125) Remove dev/github_jira_sync.py
[ https://issues.apache.org/jira/browse/SPARK-45125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45125: --- Labels: pull-request-available (was: ) > Remove dev/github_jira_sync.py > -- > > Key: SPARK-45125 > URL: https://issues.apache.org/jira/browse/SPARK-45125 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > > https://issues.apache.org/jira/browse/SPARK-44942 > https://issues.apache.org/jira/browse/INFRA-24962 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45125) Remove dev/github_jira_sync.py
Kent Yao created SPARK-45125: Summary: Remove dev/github_jira_sync.py Key: SPARK-45125 URL: https://issues.apache.org/jira/browse/SPARK-45125 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 4.0.0 Reporter: Kent Yao https://issues.apache.org/jira/browse/SPARK-44942 https://issues.apache.org/jira/browse/INFRA-24962 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45122) Automate updating versions. json
[ https://issues.apache.org/jira/browse/SPARK-45122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45122: --- Labels: pull-request-available (was: ) > Automate updating versions. json > > > Key: SPARK-45122 > URL: https://issues.apache.org/jira/browse/SPARK-45122 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45124) Do not use local user ID for Local Relations
[ https://issues.apache.org/jira/browse/SPARK-45124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45124: --- Labels: pull-request-available (was: ) > Do not use local user ID for Local Relations > > > Key: SPARK-45124 > URL: https://issues.apache.org/jira/browse/SPARK-45124 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Allowing a fetch of a local relation using user-provided information is a > potential security risk since this allows users to fetch arbitrary local > relations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45124) Do not use local user ID for Local Relations
Hyukjin Kwon created SPARK-45124: Summary: Do not use local user ID for Local Relations Key: SPARK-45124 URL: https://issues.apache.org/jira/browse/SPARK-45124 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Hyukjin Kwon Allowing a fetch of a local relation using user-provided information is a potential security risk since this allows users to fetch arbitrary local relations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45120) Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI
[ https://issues.apache.org/jira/browse/SPARK-45120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45120: --- Labels: pull-request-available (was: ) > Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI > > > Key: SPARK-45120 > URL: https://issues.apache.org/jira/browse/SPARK-45120 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43123) special internal field metadata should not be leaked to catalogs
[ https://issues.apache.org/jira/browse/SPARK-43123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43123: --- Labels: pull-request-available (was: ) > special internal field metadata should not be leaked to catalogs > > > Key: SPARK-43123 > URL: https://issues.apache.org/jira/browse/SPARK-43123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45123) Raise TypeError for DataFrame.interpolate when all columns are object-dtype.
Haejoon Lee created SPARK-45123: --- Summary: Raise TypeError for DataFrame.interpolate when all columns are object-dtype. Key: SPARK-45123 URL: https://issues.apache.org/jira/browse/SPARK-45123 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee To match the pandas behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45122) Automate updating versions. json
BingKun Pan created SPARK-45122: --- Summary: Automate updating versions. json Key: SPARK-45122 URL: https://issues.apache.org/jira/browse/SPARK-45122 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45121) Support Series.empty for Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-45121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45121: --- Labels: pull-request-available (was: ) > Support Series.empty for Spark Connect. > --- > > Key: SPARK-45121 > URL: https://issues.apache.org/jira/browse/SPARK-45121 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should remove JVM dependency for Pandas API on Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45121) Support Series.empty for Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-45121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-45121: Summary: Support Series.empty for Spark Connect. (was: Support Series.empty for Spark Connect.d) > Support Series.empty for Spark Connect. > --- > > Key: SPARK-45121 > URL: https://issues.apache.org/jira/browse/SPARK-45121 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > We should remove JVM dependency for Pandas API on Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45121) Support Series.empty for Spark Connect.d
Haejoon Lee created SPARK-45121: --- Summary: Support Series.empty for Spark Connect.d Key: SPARK-45121 URL: https://issues.apache.org/jira/browse/SPARK-45121 Project: Spark Issue Type: Sub-task Components: Connect, Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee We should remove JVM dependency for Pandas API on Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45110) Upgrade rocksdbjni to 8.5.3
[ https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-45110: Affects Version/s: 3.5.0 > Upgrade rocksdbjni to 8.5.3 > --- > > Key: SPARK-45110 > URL: https://issues.apache.org/jira/browse/SPARK-45110 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0, 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45110) Upgrade rocksdbjni to 8.5.3
[ https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-45110: Issue Type: Bug (was: Improvement) > Upgrade rocksdbjni to 8.5.3 > --- > > Key: SPARK-45110 > URL: https://issues.apache.org/jira/browse/SPARK-45110 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45120) Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI
Kent Yao created SPARK-45120: Summary: Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI Key: SPARK-45120 URL: https://issues.apache.org/jira/browse/SPARK-45120 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43351) Support Golang in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43351: --- Labels: pull-request-available (was: ) > Support Golang in Spark Connect > --- > > Key: SPARK-43351 > URL: https://issues.apache.org/jira/browse/SPARK-43351 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: BoYang >Assignee: BoYang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Support Spark Connect client side in Go programming language -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44915) Validate checksum of remounted PVC's shuffle data before recovery
[ https://issues.apache.org/jira/browse/SPARK-44915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44915: --- Labels: pull-request-available (was: ) > Validate checksum of remounted PVC's shuffle data before recovery > - > > Key: SPARK-44915 > URL: https://issues.apache.org/jira/browse/SPARK-44915 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24203) Make executor's bindAddress configurable
[ https://issues.apache.org/jira/browse/SPARK-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-24203: --- Labels: pull-request-available (was: ) > Make executor's bindAddress configurable > > > Key: SPARK-24203 > URL: https://issues.apache.org/jira/browse/SPARK-24203 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Lukas Majercak >Assignee: Nishchal Venkataramana >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45119) Refine docstring of `inline`
[ https://issues.apache.org/jira/browse/SPARK-45119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45119: --- Labels: pull-request-available (was: ) > Refine docstring of `inline` > > > Key: SPARK-45119 > URL: https://issues.apache.org/jira/browse/SPARK-45119 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine docstring of the `inline` function -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45119) Refine docstring of `inline`
Allison Wang created SPARK-45119: Summary: Refine docstring of `inline` Key: SPARK-45119 URL: https://issues.apache.org/jira/browse/SPARK-45119 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine docstring of the `inline` function -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763940#comment-17763940 ] Krystal Mitchell commented on SPARK-24815: -- Thank you [~pavan0831]. This draft PR will impact some of the projects we are currently working on. > Structured Streaming should support dynamic allocation > -- > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming >Affects Versions: 2.3.1 >Reporter: Karthik Palaniappan >Priority: Minor > Labels: pull-request-available > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters
[ https://issues.apache.org/jira/browse/SPARK-45118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45118: --- Labels: pull-request-available (was: ) > Refactor converters for complex types to short cut when the element types > don't need converters > --- > > Key: SPARK-45118 > URL: https://issues.apache.org/jira/browse/SPARK-45118 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters
Takuya Ueshin created SPARK-45118: - Summary: Refactor converters for complex types to short cut when the element types don't need converters Key: SPARK-45118 URL: https://issues.apache.org/jira/browse/SPARK-45118 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44912) Spark 3.4 multi-column sum slows with many columns
[ https://issues.apache.org/jira/browse/SPARK-44912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brady Bickel resolved SPARK-44912. -- Resolution: Fixed Verified build containing linked issue fix solved the problem. > Spark 3.4 multi-column sum slows with many columns > -- > > Key: SPARK-44912 > URL: https://issues.apache.org/jira/browse/SPARK-44912 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0, 3.4.1 >Reporter: Brady Bickel >Priority: Major > > The code below is a minimal reproducible example of an issue I discovered > with Pyspark 3.4.x. I want to sum the values of multiple columns and put the > sum of those columns (per row) into a new column. This code works and returns > in a reasonable amount of time in Pyspark 3.3.x, but is extremely slow in > Pyspark 3.4.x when the number of columns grows. See below for execution > timing summary as N varies. > {code:java} > import pyspark.sql.functions as F > import random > import string > from functools import reduce > from operator import add > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > # generate a dataframe N columns by M rows with random 8 digit column > # names and random integers in [-5,10] > N = 30 > M = 100 > columns = [''.join(random.choices(string.ascii_uppercase + > string.digits, k=8)) >for _ in range(N)] > data = [tuple([random.randint(-5,10) for _ in range(N)]) > for _ in range(M)] > df = spark.sparkContext.parallelize(data).toDF(columns) > # 3 ways to add a sum column, all of them slow for high N in spark 3.4 > df = df.withColumn("col_sum1", sum(df[col] for col in columns)) > df = df.withColumn("col_sum2", reduce(add, [F.col(col) for col in columns])) > df = df.withColumn("col_sum3", F.expr("+".join(columns))) {code} > Timing results for Spark 3.3: > ||N||Exe Time (s)|| > |5|0.514| > |10|0.248| > |15|0.327| > |20|0.403| > |25|0.279| > |30|0.322| > |50|0.430| > Timing results for Spark 3.4: > ||N||Exe Time (s)|| > |5|0.379| > |10|0.318| > |15|0.405| > |20|1.32| > |25|28.8| > |30|448| > |50|>1 (did not finish)| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24815) Structured Streaming should support dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-24815: --- Labels: pull-request-available (was: ) > Structured Streaming should support dynamic allocation > -- > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming >Affects Versions: 2.3.1 >Reporter: Karthik Palaniappan >Priority: Minor > Labels: pull-request-available > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45084) ProgressReport should include an accurate effective shuffle partition number
[ https://issues.apache.org/jira/browse/SPARK-45084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45084: --- Labels: pull-request-available (was: ) > ProgressReport should include an accurate effective shuffle partition number > > > Key: SPARK-45084 > URL: https://issues.apache.org/jira/browse/SPARK-45084 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.4.2 >Reporter: Siying Dong >Priority: Minor > Labels: pull-request-available > > Currently, there is a numShufflePartitions "metric" reported in > StateOperatorProgress part of the progress report. However, the number is > reported by aggregating executors so in the case of task retry or speculative > executor, the metric is higher than number of shuffle partitions for the > query plan. Number of shuffle partitions can be useful for reporting purpose > so having a metric is helpful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44647) Support SPJ when join key is subset of partition keys
[ https://issues.apache.org/jira/browse/SPARK-44647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44647: - Assignee: Szehon Ho > Support SPJ when join key is subset of partition keys > - > > Key: SPARK-44647 > URL: https://issues.apache.org/jira/browse/SPARK-44647 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44647) Support SPJ when join key is subset of partition keys
[ https://issues.apache.org/jira/browse/SPARK-44647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44647. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42306 [https://github.com/apache/spark/pull/42306] > Support SPJ when join key is subset of partition keys > - > > Key: SPARK-44647 > URL: https://issues.apache.org/jira/browse/SPARK-44647 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45117) Implement missing otherCopyArgs for the MultiCommutativeOp expression
[ https://issues.apache.org/jira/browse/SPARK-45117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45117: --- Labels: pull-request-available (was: ) > Implement missing otherCopyArgs for the MultiCommutativeOp expression > - > > Key: SPARK-45117 > URL: https://issues.apache.org/jira/browse/SPARK-45117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1 >Reporter: Supun Nakandala >Priority: Major > Labels: pull-request-available > > Calling toJSON on a `MultiCommutativeOp` throws an assertion error as it does > not implement the `otherCopyArgs` method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45117) Implement missing otherCopyArgs for the MultiCommutativeOp expression
Supun Nakandala created SPARK-45117: --- Summary: Implement missing otherCopyArgs for the MultiCommutativeOp expression Key: SPARK-45117 URL: https://issues.apache.org/jira/browse/SPARK-45117 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1 Reporter: Supun Nakandala Calling toJSON on a `MultiCommutativeOp` throws an assertion error as it does not implement the `otherCopyArgs` method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24203) Make executor's bindAddress configurable
[ https://issues.apache.org/jira/browse/SPARK-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763821#comment-17763821 ] Ignite TC Bot commented on SPARK-24203: --- User 'gedeh' has created a pull request for this issue: https://github.com/apache/spark/pull/42870 > Make executor's bindAddress configurable > > > Key: SPARK-24203 > URL: https://issues.apache.org/jira/browse/SPARK-24203 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Lukas Majercak >Assignee: Nishchal Venkataramana >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45075) Alter table with invalid default value will not report error
[ https://issues.apache.org/jira/browse/SPARK-45075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45075: -- Fix Version/s: (was: 3.4.2) > Alter table with invalid default value will not report error > > > Key: SPARK-45075 > URL: https://issues.apache.org/jira/browse/SPARK-45075 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > > create table t(i boolean, s bigint); > alter table t alter column s set default badvalue; > > The code wouldn't report error on DataSource V2, not align with V1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45111) Upgrade maven to 3.9.4
[ https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-45111: - Priority: Minor (was: Major) > Upgrade maven to 3.9.4 > -- > > Key: SPARK-45111 > URL: https://issues.apache.org/jira/browse/SPARK-45111 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45111) Upgrade maven to 3.9.4
[ https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-45111. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42827 [https://github.com/apache/spark/pull/42827] > Upgrade maven to 3.9.4 > -- > > Key: SPARK-45111 > URL: https://issues.apache.org/jira/browse/SPARK-45111 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45111) Upgrade maven to 3.9.4
[ https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-45111: Assignee: Yang Jie > Upgrade maven to 3.9.4 > -- > > Key: SPARK-45111 > URL: https://issues.apache.org/jira/browse/SPARK-45111 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015
[ https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43251. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42845 [https://github.com/apache/spark/pull/42845] > Assign a name to the error class _LEGACY_ERROR_TEMP_2015 > > > Key: SPARK-43251 > URL: https://issues.apache.org/jira/browse/SPARK-43251 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Deng Ziming >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015
[ https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43251: Assignee: Deng Ziming > Assign a name to the error class _LEGACY_ERROR_TEMP_2015 > > > Key: SPARK-43251 > URL: https://issues.apache.org/jira/browse/SPARK-43251 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Deng Ziming >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45092) Avoid analyze twice for failed queries
[ https://issues.apache.org/jira/browse/SPARK-45092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45092: --- Labels: pull-request-available (was: ) > Avoid analyze twice for failed queries > -- > > Key: SPARK-45092 > URL: https://issues.apache.org/jira/browse/SPARK-45092 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45069) SQL variable should always be resolved after outer reference
[ https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-45069. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42803 [https://github.com/apache/spark/pull/42803] > SQL variable should always be resolved after outer reference > > > Key: SPARK-45069 > URL: https://issues.apache.org/jira/browse/SPARK-45069 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45069) SQL variable should always be resolved after outer reference
[ https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-45069: --- Assignee: Wenchen Fan > SQL variable should always be resolved after outer reference > > > Key: SPARK-45069 > URL: https://issues.apache.org/jira/browse/SPARK-45069 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36191) Support ORDER BY and LIMIT to be on the correlation path
[ https://issues.apache.org/jira/browse/SPARK-36191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-36191: --- Labels: pull-request-available (was: ) > Support ORDER BY and LIMIT to be on the correlation path > > > Key: SPARK-36191 > URL: https://issues.apache.org/jira/browse/SPARK-36191 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > A correlation path is defined as the sub-tree of all the operators that are > on the path from the operator hosting the correlated expressions up to the > operator producing the correlated values. > We want to support ORDER BY (Sort) and LIMT operators to be on the > correlation path to achieve better feature parity with Postgres. Here is an > example query in `postgreSQL/join.sql`: > {code:SQL} > select * from > text_tbl t1 > left join int8_tbl i8 > on i8.q2 = 123, > lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss > where t1.f1 = ss.f1; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42746) Add the LISTAGG() aggregate function
[ https://issues.apache.org/jira/browse/SPARK-42746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42746: --- Labels: pull-request-available (was: ) > Add the LISTAGG() aggregate function > > > Key: SPARK-42746 > URL: https://issues.apache.org/jira/browse/SPARK-42746 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > Labels: pull-request-available > > {{listagg()}} is a common and useful aggregation function to concatenate > string values in a column, optionally by a certain order. The systems below > have supported such function already: > * Oracle: > [https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions089.htm#SQLRF30030] > * Snowflake: [https://docs.snowflake.com/en/sql-reference/functions/listagg] > * Amazon Redshift: > [https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html] > * Google BigQuery: > [https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg] > Need to introduce this new aggregate in Spark, both as a regular aggregate > and as a window function. > Proposed syntax: > {code:sql} > LISTAGG( [ DISTINCT ] [, ] ) [ WITHIN GROUP ( > ) ] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45116) Add some comment for param of JdbcDialect createTable
[ https://issues.apache.org/jira/browse/SPARK-45116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45116: --- Labels: pull-request-available (was: ) > Add some comment for param of JdbcDialect createTable > - > > Key: SPARK-45116 > URL: https://issues.apache.org/jira/browse/SPARK-45116 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jia Fan >Priority: Minor > Labels: pull-request-available > > Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't > add comment for param. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45116) Add some comment for param of JdbcDialect createTable
Jia Fan created SPARK-45116: --- Summary: Add some comment for param of JdbcDialect createTable Key: SPARK-45116 URL: https://issues.apache.org/jira/browse/SPARK-45116 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Jia Fan Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't add comment for param. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls
[ https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763725#comment-17763725 ] Steve Loughran commented on SPARK-38958: [~hershalb] hadoop trunk is now on v2 sdk, but we are still stabilising client binding. > Override S3 Client in Spark Write/Read calls > > > Key: SPARK-38958 > URL: https://issues.apache.org/jira/browse/SPARK-38958 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Hershal >Priority: Major > > Hello, > I have been working to use spark to read and write data to S3. Unfortunately, > there are a few S3 headers that I need to add to my spark read/write calls. > After much looking, I have not found a way to replace the S3 client that > spark uses to make the read/write calls. I also have not found a > configuration that allows me to pass in S3 headers. Here is an example of > some common S3 request headers > ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).] > Does there already exist functionality to add S3 headers to spark read/write > calls or pass in a custom client that would pass these headers on every > read/write request? Appreciate the help and feedback > > Thanks, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29670) Make executor's bindAddress configurable
[ https://issues.apache.org/jira/browse/SPARK-29670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-29670: --- Labels: pull-request-available (was: ) > Make executor's bindAddress configurable > > > Key: SPARK-29670 > URL: https://issues.apache.org/jira/browse/SPARK-29670 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4 >Reporter: Nishchal Venkataramana >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45069) SQL variable should always be resolved after outer reference
[ https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45069: --- Labels: pull-request-available (was: ) > SQL variable should always be resolved after outer reference > > > Key: SPARK-45069 > URL: https://issues.apache.org/jira/browse/SPARK-45069 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32014) Support calling stored procedure on JDBC data source
[ https://issues.apache.org/jira/browse/SPARK-32014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763688#comment-17763688 ] Sumanto Pal commented on SPARK-32014: - Why isn't this prioritiezed? > Support calling stored procedure on JDBC data source > > > Key: SPARK-32014 > URL: https://issues.apache.org/jira/browse/SPARK-32014 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yoshi Matsuzaki >Priority: Major > > Currently, all queries via JDBC data source are enveloped by outer SELECT as > described below: > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html] > {quote} > A query that will be used to read data into Spark. The specified query will > be parenthesized and used as a subquery in the FROM clause. Spark will also > assign an alias to the subquery clause. As an example, spark will issue a > query of the following form to the JDBC Source. > SELECT FROM () spark_gen_alias > {quote} > Because of the behavior, we cannot call a stored procedure in major > databases, because stored procedure call syntax is usually not allowed to be > used in a subquery because its returned value is optional. > For example, below Scala code to execute a query on Snowflake as JDBC data > source raises a syntax error, because the query "call proc()" is rewritten to > "select * from (call proc()) where 1 = 0", and it is invalid because CALL > cannot be in the middle of a query. > {code:scala} > val df: DataFrame = spark.read > .format("snowflake") > .options(options) > .option("query", "call proc()") > .load() > display(df) > {code} > I tested this with Snowflake, but it should happen in any major database > systems. > I understand JDBC data source is to read and write data through Dataframe, > then the interfaces implemented are just to read and write, but sometimes we > need to just execute some queries before or after reading/writing, for > example, to preprocess the data by stored procedure. > I would appreciate it if you could consider to implement some interface/way > to allow us to call a stored procedure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit
[ https://issues.apache.org/jira/browse/SPARK-45115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumanto Pal updated SPARK-45115: Issue Type: Improvement (was: New Feature) > No way to exclude jars setting to classpath while doing spark-submit > > > Key: SPARK-45115 > URL: https://issues.apache.org/jira/browse/SPARK-45115 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.4.1 >Reporter: Sumanto Pal >Priority: Blocker > Original Estimate: 336h > Remaining Estimate: 336h > > The challenge is whenever you do spark-submit to start the application, the > jars present in spark home directory gets added to classpath automatically > and there is no way to exclude specific jars from there. For example, we dont > want slf4j jars present in spark home directory to be setted in classpath as > in codebase slf4j is already there. Thus it causes conflicts in jars. This > forces user to change there codebase to support spark-submit or to manually > remove the jars from spark-home directory. This i believe is not right > practice as we deviating from using spark as it supposed to be and it causes > unfixable behaviors at various instances with no clue. Example linkages > errors are common with the jar conflicts. > > There is detailed stackoverflow question on this issue. > refer : > https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit
Sumanto Pal created SPARK-45115: --- Summary: No way to exclude jars setting to classpath while doing spark-submit Key: SPARK-45115 URL: https://issues.apache.org/jira/browse/SPARK-45115 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 3.4.1 Reporter: Sumanto Pal The challenge is whenever you do spark-submit to start the application, the jars present in spark home directory gets added to classpath automatically and there is no way to exclude specific jars from there. For example, we dont want slf4j jars present in spark home directory to be setted in classpath as in codebase slf4j is already there. Thus it causes conflicts in jars. This forces user to change there codebase to support spark-submit or to manually remove the jars from spark-home directory. This i believe is not right practice as we deviating from using spark as it supposed to be and it causes unfixable behaviors at various instances with no clue. Example linkages errors are common with the jar conflicts. There is detailed stackoverflow question on this issue. refer : https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit
[ https://issues.apache.org/jira/browse/SPARK-45115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumanto Pal updated SPARK-45115: Target Version/s: (was: 3.4.1) > No way to exclude jars setting to classpath while doing spark-submit > > > Key: SPARK-45115 > URL: https://issues.apache.org/jira/browse/SPARK-45115 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.4.1 >Reporter: Sumanto Pal >Priority: Blocker > Original Estimate: 336h > Remaining Estimate: 336h > > The challenge is whenever you do spark-submit to start the application, the > jars present in spark home directory gets added to classpath automatically > and there is no way to exclude specific jars from there. For example, we dont > want slf4j jars present in spark home directory to be setted in classpath as > in codebase slf4j is already there. Thus it causes conflicts in jars. This > forces user to change there codebase to support spark-submit or to manually > remove the jars from spark-home directory. This i believe is not right > practice as we deviating from using spark as it supposed to be and it causes > unfixable behaviors at various instances with no clue. Example linkages > errors are common with the jar conflicts. > > There is detailed stackoverflow question on this issue. > refer : > https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit
[ https://issues.apache.org/jira/browse/SPARK-45115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumanto Pal updated SPARK-45115: Issue Type: New Feature (was: Bug) > No way to exclude jars setting to classpath while doing spark-submit > > > Key: SPARK-45115 > URL: https://issues.apache.org/jira/browse/SPARK-45115 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 3.4.1 >Reporter: Sumanto Pal >Priority: Blocker > Original Estimate: 336h > Remaining Estimate: 336h > > The challenge is whenever you do spark-submit to start the application, the > jars present in spark home directory gets added to classpath automatically > and there is no way to exclude specific jars from there. For example, we dont > want slf4j jars present in spark home directory to be setted in classpath as > in codebase slf4j is already there. Thus it causes conflicts in jars. This > forces user to change there codebase to support spark-submit or to manually > remove the jars from spark-home directory. This i believe is not right > practice as we deviating from using spark as it supposed to be and it causes > unfixable behaviors at various instances with no clue. Example linkages > errors are common with the jar conflicts. > > There is detailed stackoverflow question on this issue. > refer : > https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters
[ https://issues.apache.org/jira/browse/SPARK-45114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45114: - Assignee: Ruifeng Zheng > Adjust the `versionadded` and `versionchanged` information to the parameters > > > Key: SPARK-45114 > URL: https://issues.apache.org/jira/browse/SPARK-45114 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters
[ https://issues.apache.org/jira/browse/SPARK-45114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45114. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42867 [https://github.com/apache/spark/pull/42867] > Adjust the `versionadded` and `versionchanged` information to the parameters > > > Key: SPARK-45114 > URL: https://issues.apache.org/jira/browse/SPARK-45114 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions
[ https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44635: --- Labels: pull-request-available (was: ) > Handle shuffle fetch failures in decommissions > -- > > Key: SPARK-44635 > URL: https://issues.apache.org/jira/browse/SPARK-44635 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Priority: Major > Labels: pull-request-available > > Spark's decommission feature supports migration of shuffle data. However > shuffle data fetcher will only look at the location (`BlockManagerId`) when > it is initialized. This can lead to shuffle fetch failures when the shuffle > read tasks are long. > > To mitigate this, shuffle data fetchers should be able to look for the > updated locations after decommissions, and fetch from there instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45113) Refine docstrings of `collect_list/collect_set`
[ https://issues.apache.org/jira/browse/SPARK-45113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45113. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42866 [https://github.com/apache/spark/pull/42866] > Refine docstrings of `collect_list/collect_set` > --- > > Key: SPARK-45113 > URL: https://issues.apache.org/jira/browse/SPARK-45113 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45113) Refine docstrings of `collect_list/collect_set`
[ https://issues.apache.org/jira/browse/SPARK-45113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45113: Assignee: Yang Jie > Refine docstrings of `collect_list/collect_set` > --- > > Key: SPARK-45113 > URL: https://issues.apache.org/jira/browse/SPARK-45113 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38215) InsertIntoHiveDir support convert metadata
[ https://issues.apache.org/jira/browse/SPARK-38215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763577#comment-17763577 ] Penglei Shi edited comment on SPARK-38215 at 9/11/23 10:48 AM: --- [~angerszhuuu] Hi, I found spark does not throw QueryCompilationErrors.cannotOverwritePathBeingReadFromError() when insert dir select from table which has same path as inserted dir, this will delete table files. Because DDLUtils.verifyNotReadPath just collect LogicalRelation rather than HiveTableRelation, this cause InsertIntoDir will be converted to InsertIntoDataSourceDirCommand in RelationConversions even though HiveTableRelation's location is same as the inserted dir. And DataSourceAnalysis will not notice that. was (Author: penglei shi): [~angerszhuuu] Hi, I found when insert dir select from table which has same path as inserted dir, it will delete table files in advance, and the directory will be empty. Because DDLUtils.verifyNotReadPath just collect LogicalRelation rather than HiveTableRelation, this cause InsertIntoDir will be converted to InsertIntoDataSourceDirCommand even though HiveTableRelation's location is same as the inserted dir. > InsertIntoHiveDir support convert metadata > -- > > Key: SPARK-38215 > URL: https://issues.apache.org/jira/browse/SPARK-38215 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > Current InsertIntoHiveDir command use hive serde write data, con't supporot > convert, cause such SQL can't write parquet with zstd. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45085: -- Assignee: (was: Apache Spark) > Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and > refactor some logic > - > > Key: SPARK-45085 > URL: https://issues.apache.org/jira/browse/SPARK-45085 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45085: -- Assignee: Apache Spark > Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and > refactor some logic > - > > Key: SPARK-45085 > URL: https://issues.apache.org/jira/browse/SPARK-45085 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45085: -- Assignee: (was: Apache Spark) > Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and > refactor some logic > - > > Key: SPARK-45085 > URL: https://issues.apache.org/jira/browse/SPARK-45085 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45085: -- Assignee: Apache Spark > Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and > refactor some logic > - > > Key: SPARK-45085 > URL: https://issues.apache.org/jira/browse/SPARK-45085 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45085: --- Labels: pull-request-available (was: ) > Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and > refactor some logic > - > > Key: SPARK-45085 > URL: https://issues.apache.org/jira/browse/SPARK-45085 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic
[ https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763619#comment-17763619 ] ASF GitHub Bot commented on SPARK-45085: User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/42824 > Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and > refactor some logic > - > > Key: SPARK-45085 > URL: https://issues.apache.org/jira/browse/SPARK-45085 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45069) SQL variable should always be resolved after outer reference
[ https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45069: -- Assignee: Apache Spark > SQL variable should always be resolved after outer reference > > > Key: SPARK-45069 > URL: https://issues.apache.org/jira/browse/SPARK-45069 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45069) SQL variable should always be resolved after outer reference
[ https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45069: -- Assignee: (was: Apache Spark) > SQL variable should always be resolved after outer reference > > > Key: SPARK-45069 > URL: https://issues.apache.org/jira/browse/SPARK-45069 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45069) SQL variable should always be resolved after outer reference
[ https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763616#comment-17763616 ] ASF GitHub Bot commented on SPARK-45069: User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/42803 > SQL variable should always be resolved after outer reference > > > Key: SPARK-45069 > URL: https://issues.apache.org/jira/browse/SPARK-45069 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45102) Support keyword columns on filters that interact with HMS
[ https://issues.apache.org/jira/browse/SPARK-45102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45102: --- Labels: pull-request-available (was: ) > Support keyword columns on filters that interact with HMS > - > > Key: SPARK-45102 > URL: https://issues.apache.org/jira/browse/SPARK-45102 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: Steve Carlin >Priority: Major > Labels: pull-request-available > > Recently, https://issues.apache.org/jira/browse/HIVE-27665 was pushed on > Hive. This will allow HMS to handle columns that are surrounded by backticks > in filters. An example of a customer who hit this problem had a filter in > Spark like this: > where date='2015-01-06' > This didn't work because the word "date" is a keyword. In order for the > customer to work, the where clause should be changed to this: > where `date`='2015-01-06' > Spark strips out the backticks before passing the filter to HMS. We need to > no longer strip the backticks as a configurable flag. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45112) Use UnresolvedFunction based resolution in SQL Dataset functions
[ https://issues.apache.org/jira/browse/SPARK-45112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-45112: --- Summary: Use UnresolvedFunction based resolution in SQL Dataset functions (was: Use UnresolvedFunction in dataset functions) > Use UnresolvedFunction based resolution in SQL Dataset functions > > > Key: SPARK-45112 > URL: https://issues.apache.org/jira/browse/SPARK-45112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Peter Toth >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters
Ruifeng Zheng created SPARK-45114: - Summary: Adjust the `versionadded` and `versionchanged` information to the parameters Key: SPARK-45114 URL: https://issues.apache.org/jira/browse/SPARK-45114 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters
[ https://issues.apache.org/jira/browse/SPARK-45114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45114: --- Labels: pull-request-available (was: ) > Adjust the `versionadded` and `versionchanged` information to the parameters > > > Key: SPARK-45114 > URL: https://issues.apache.org/jira/browse/SPARK-45114 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38215) InsertIntoHiveDir support convert metadata
[ https://issues.apache.org/jira/browse/SPARK-38215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763577#comment-17763577 ] Penglei Shi commented on SPARK-38215: - [~angerszhuuu] Hi, I found when insert dir select from table which has same path as inserted dir, it will delete table files in advance, and the directory will be empty. Because DDLUtils.verifyNotReadPath just collect LogicalRelation rather than HiveTableRelation, this cause InsertIntoDir will be converted to InsertIntoDataSourceDirCommand even though HiveTableRelation's location is same as the inserted dir. > InsertIntoHiveDir support convert metadata > -- > > Key: SPARK-38215 > URL: https://issues.apache.org/jira/browse/SPARK-38215 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > Current InsertIntoHiveDir command use hive serde write data, con't supporot > convert, cause such SQL can't write parquet with zstd. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45020) org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'default' not found (state=08S01,code=0)
[ https://issues.apache.org/jira/browse/SPARK-45020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45020: --- Labels: pull-request-available (was: ) > org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database > 'default' not found (state=08S01,code=0) > - > > Key: SPARK-45020 > URL: https://issues.apache.org/jira/browse/SPARK-45020 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Sruthi Mooriyathvariam >Priority: Minor > Labels: pull-request-available > > There is an alert that fires up when a Spark 3.1 cluster is created using > shared metastore with Spark 2.4. The alert says DefaultDatabase does not > exist. This is misleading and thus we need to suppress this alert. > In the class SessionCatalog.scala, the method requireDbExists() is not > handling the case when the db = defaultDB. This needs to be added to suppress > this misleading alert. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org