[jira] [Updated] (SPARK-47793) Implement SimpleDataSourceStreamReader for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-47793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-47793: --- Epic Link: SPARK-46866 > Implement SimpleDataSourceStreamReader for python streaming data source > --- > > Key: SPARK-47793 > URL: https://issues.apache.org/jira/browse/SPARK-47793 > Project: Spark > Issue Type: New Feature > Components: PySpark, SS >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Priority: Major > Labels: pull-request-available > > SimpleDataSourceStreamReader is a simplified version of the DataStreamReader > interface. > # It doesn’t require developers to reason about data partitioning. > # It doesn’t require getting the latest offset before reading data. > There are 3 functions that needs to be defined > 1. Read data and return the end offset. > _def read(self, start: Offset) -> (Iterator[Tuple], Offset)_ > 2. Read data between start and end offset, this is required for exactly once > read. > _def read2(self, start: Offset, end: Offset) -> Iterator[Tuple]_ > 3. initial start offset of the streaming query. > def initialOffset() -> dict > Implementation: Wrap the SimpleDataSourceStreamReader instance in a > DataSourceStreamReader internally and make the prefetching and caching > transparent to the data source developer. The record prefetched in python > process will be sent to JVM as arrow record batches. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47920) Add documentation for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-47920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-47920: --- Epic Link: SPARK-46866 > Add documentation for python streaming data source > -- > > Key: SPARK-47920 > URL: https://issues.apache.org/jira/browse/SPARK-47920 > Project: Spark > Issue Type: New Feature > Components: PySpark, SS >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Add documentation (user guide) for Python data source API. > The DOC should explain how to develop and use DataSourceStreamReader and > DataSourceStreamWriter -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-4: --- Epic Link: SPARK-46866 > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS, Tests >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47273) Implement python stream writer interface
[ https://issues.apache.org/jira/browse/SPARK-47273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-47273: --- Epic Link: SPARK-46866 > Implement python stream writer interface > > > Key: SPARK-47273 > URL: https://issues.apache.org/jira/browse/SPARK-47273 > Project: Spark > Issue Type: Improvement > Components: PySpark, SS >Affects Versions: 4.0.0 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In order to support developing spark streaming sink in python, we need to > implement python stream writer interface. > Reuse PythonPartitionWriter to implement the serialization and execution of > write callback in executor. > Implement python worker process to run python streaming data sink committer > and communicate with JVM through socket in spark driver. For each python > streaming data sink instance there will be a long live python worker process > created. Inside the python process, the python write committer will receive > abort or commit function call and send back result through socket. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47107) Implement partition reader for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-47107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-47107: --- Epic Link: SPARK-46866 > Implement partition reader for python streaming data source > --- > > Key: SPARK-47107 > URL: https://issues.apache.org/jira/browse/SPARK-47107 > Project: Spark > Issue Type: Improvement > Components: PySpark, SS >Affects Versions: 4.0.0 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Piggy back the PythonPartitionReaderFactory to implement reading a data > partition for python streaming data source. Add test case to verify that > python streaming data source can read and process data end to end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47933: --- Labels: pull-request-available (was: ) > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47932) Avoid using legacy commons-lang
[ https://issues.apache.org/jira/browse/SPARK-47932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47932: --- Labels: pull-request-available (was: ) > Avoid using legacy commons-lang > --- > > Key: SPARK-47932 > URL: https://issues.apache.org/jira/browse/SPARK-47932 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
Hyukjin Kwon created SPARK-47933: Summary: Parent Column class for Spark Connect and Spark Classic Key: SPARK-47933 URL: https://issues.apache.org/jira/browse/SPARK-47933 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47902) Compute Current Time* expressions should be foldable
[ https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47902. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46120 [https://github.com/apache/spark/pull/46120] > Compute Current Time* expressions should be foldable > > > Key: SPARK-47902 > URL: https://issues.apache.org/jira/browse/SPARK-47902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Following PR - https://github.com/apache/spark/pull/44261 changed "compute > current time" family of expressions to be unevaluable, given that these > expressions are supposed to be replaced with literals by QO. Unevaluable > implies that these expressions are not foldable, even though they will be > replaced by literals. > If these expressions were used in places that require constant folding (e.g. > RAND()) new behavior would be to raise an error which is a regression > comparing to behavior prior to spark 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47902) Compute Current Time* expressions should be foldable
[ https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47902: --- Assignee: Aleksandar Tomic > Compute Current Time* expressions should be foldable > > > Key: SPARK-47902 > URL: https://issues.apache.org/jira/browse/SPARK-47902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > > Following PR - https://github.com/apache/spark/pull/44261 changed "compute > current time" family of expressions to be unevaluable, given that these > expressions are supposed to be replaced with literals by QO. Unevaluable > implies that these expressions are not foldable, even though they will be > replaced by literals. > If these expressions were used in places that require constant folding (e.g. > RAND()) new behavior would be to raise an error which is a regression > comparing to behavior prior to spark 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33826) InsertIntoHiveTable generate HDFS file with invalid user
[ https://issues.apache.org/jira/browse/SPARK-33826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839449#comment-17839449 ] angerszhu commented on SPARK-33826: --- What is RIK? Not sure what do you mean > InsertIntoHiveTable generate HDFS file with invalid user > > > Key: SPARK-33826 > URL: https://issues.apache.org/jira/browse/SPARK-33826 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > *Arch:* Hive on Spark. > > *Version:* Spark 2.3.2 > > *Conf:* > Enable user impersonation > hive.server2.enable.doAs=true > > *Scenario:* > Thriftserver is running with loginUser A, and Task run as User A too. > Client execute SQL with user B > > Data generated by sql "insert into TABLE \[tbl\] select XXX form ." is > written to HDFS on executor, executor doesn't know B. > > *{color:#de350b}So the user file written to HDFS will be user A which should > be B.{color}* > > I also check the inplementation of Spark 3.0.0, It could have the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47845) Support column type in split function in scala and python
[ https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47845. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46045 [https://github.com/apache/spark/pull/46045] > Support column type in split function in scala and python > - > > Key: SPARK-47845 > URL: https://issues.apache.org/jira/browse/SPARK-47845 > Project: Spark > Issue Type: New Feature > Components: Connect, Spark Core >Affects Versions: 3.5.1 >Reporter: Liu Cao >Assignee: Liu Cao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > I have a use case to split a String typed column with different delimiters > defined in other columns of the dataframe. SQL already supports this, but > scala / python functions currently don't. > > A hypothetical example to illustrate: > {code:java} > import org.apache.spark.sql.functions.{col, split} > val example = spark.createDataFrame( > Seq( > ("Doe, John", ", ", 2), > ("Smith,Jane", ",", 2), > ("Johnson", ",", 1) > ) > ) > .toDF("name", "delim", "expected_parts_count") > example.createOrReplaceTempView("test_data") > // works for SQL > spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM > test_data").show() > // currently doesn't compile for scala, but easy to support > example.withColumn("name_parts", split(col("name"), col("delim"), > col("expected_parts_count"))).show() {code} > > Pretty simple patch that I can make a PR soon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47845) Support column type in split function in scala and python
[ https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-47845: - Assignee: Liu Cao > Support column type in split function in scala and python > - > > Key: SPARK-47845 > URL: https://issues.apache.org/jira/browse/SPARK-47845 > Project: Spark > Issue Type: New Feature > Components: Connect, Spark Core >Affects Versions: 3.5.1 >Reporter: Liu Cao >Assignee: Liu Cao >Priority: Major > Labels: pull-request-available > > I have a use case to split a String typed column with different delimiters > defined in other columns of the dataframe. SQL already supports this, but > scala / python functions currently don't. > > A hypothetical example to illustrate: > {code:java} > import org.apache.spark.sql.functions.{col, split} > val example = spark.createDataFrame( > Seq( > ("Doe, John", ", ", 2), > ("Smith,Jane", ",", 2), > ("Johnson", ",", 1) > ) > ) > .toDF("name", "delim", "expected_parts_count") > example.createOrReplaceTempView("test_data") > // works for SQL > spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM > test_data").show() > // currently doesn't compile for scala, but easy to support > example.withColumn("name_parts", split(col("name"), col("delim"), > col("expected_parts_count"))).show() {code} > > Pretty simple patch that I can make a PR soon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839436#comment-17839436 ] Hyukjin Kwon commented on SPARK-47909: -- Yes, I am working on it today :-). > Parent DataFrame class for Spark Connect and Spark Classic > -- > > Key: SPARK-47909 > URL: https://issues.apache.org/jira/browse/SPARK-47909 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47909. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46129 [https://github.com/apache/spark/pull/46129] > Parent DataFrame class for Spark Connect and Spark Classic > -- > > Key: SPARK-47909 > URL: https://issues.apache.org/jira/browse/SPARK-47909 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47909: Assignee: Hyukjin Kwon > Parent DataFrame class for Spark Connect and Spark Classic > -- > > Key: SPARK-47909 > URL: https://issues.apache.org/jira/browse/SPARK-47909 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47600) MLLib: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47600: --- Labels: pull-request-available (was: ) > MLLib: Migrate logInfo with variables to structured logging framework > - > > Key: SPARK-47600 > URL: https://issues.apache.org/jira/browse/SPARK-47600 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"
[ https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Pan updated SPARK-47928: -- Affects Version/s: 3.2.0 (was: 4.0.0) > Speed up test "Add jar support Ivy URI in SQL" > -- > > Key: SPARK-47928 > URL: https://issues.apache.org/jira/browse/SPARK-47928 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47351) StringToMap (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47351: - Summary: StringToMap (all collations) (was: StringToMap) > StringToMap (all collations) > > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47354) ParseJson & VariantExplode (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47354: - Summary: ParseJson & VariantExplode (all collations) (was: ParseJson (all collations)) > ParseJson & VariantExplode (all collations) > --- > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47421) Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47421: - Summary: Mask (all collations) (was: Mask) > Mask (all collations) > - > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47353) Mode (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47353: - Summary: Mode (all collations) (was: Mode) > Mode (all collations) > - > > Key: SPARK-47353 > URL: https://issues.apache.org/jira/browse/SPARK-47353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47354) ParseJson (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47354: - Summary: ParseJson (all collations) (was: TBD) > ParseJson (all collations) > -- > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"
[ https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47928: --- Labels: pull-request-available (was: ) > Speed up test "Add jar support Ivy URI in SQL" > -- > > Key: SPARK-47928 > URL: https://issues.apache.org/jira/browse/SPARK-47928 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"
Cheng Pan created SPARK-47928: - Summary: Speed up test "Add jar support Ivy URI in SQL" Key: SPARK-47928 URL: https://issues.apache.org/jira/browse/SPARK-47928 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated
[ https://issues.apache.org/jira/browse/SPARK-41469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41469: --- Labels: pull-request-available (was: ) > Task rerun on decommissioned executor can be avoided if shuffle data has > migrated > - > > Key: SPARK-41469 > URL: https://issues.apache.org/jira/browse/SPARK-41469 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.3, 3.2.2, 3.3.1 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Currently, we will always rerun a finished shuffle map task if it once runs > the lost executor. However, in the case of the executor loss is caused by > decommission, the shuffle data might be migrated so that task doesn't need to > rerun. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47692) Fix default StringType meaning in implicit casting
[ https://issues.apache.org/jira/browse/SPARK-47692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47692: -- Summary: Fix default StringType meaning in implicit casting (was: Addition of priority flag to StringType) > Fix default StringType meaning in implicit casting > -- > > Key: SPARK-47692 > URL: https://issues.apache.org/jira/browse/SPARK-47692 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47421) Mask
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47421: - Summary: Mask (was: TBD) > Mask > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47351) StringToMap
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47351: - Summary: StringToMap (was: TBD) > StringToMap > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels
[ https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47730: --- Labels: pull-request-available (was: ) > Support APP_ID and EXECUTOR_ID placeholder in labels > > > Key: SPARK-47730 > URL: https://issues.apache.org/jira/browse/SPARK-47730 > Project: Spark > Issue Type: Improvement > Components: k8s >Affects Versions: 3.5.1 >Reporter: Xi Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47927) Nullability after join not respected in UDF
Emil Ejbyfeldt created SPARK-47927: -- Summary: Nullability after join not respected in UDF Key: SPARK-47927 URL: https://issues.apache.org/jira/browse/SPARK-47927 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: Emil Ejbyfeldt {code:java} val ds1 = Seq(1).toDS() val ds2 = Seq[Int]().toDS() val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) ds1.join(ds2, ds1("value") === ds2("value"), "outer").select(f(struct(ds1("value"), ds2("value".show() ds1.join(ds2, ds1("value") === ds2("value"), "outer").select(struct(ds1("value"), ds2("value"))).show() {code} outputs {code:java} +---+ |UDF(struct(value, value, value, value))| +---+ | {1, 0}| +---+ ++ |struct(value, value)| ++ | {1, NULL}| ++ {code} So when the result is passed to UDF the null-ability after the the join is not respected and we incorrectly end up with a 0 value instead of a null/None value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org