[GitHub] [spark] pralabhkumar commented on pull request #37009: [SPARK-38292][PYTHON]na_filter added to csv

2022-06-29 Thread GitBox
pralabhkumar commented on PR #37009: URL: https://github.com/apache/spark/pull/37009#issuecomment-1170794852 @HyukjinKwon Please review the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] beliefer commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
beliefer commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910627460 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -482,6 +482,17 @@ object JdbcUtils extends Logging with SQLConfHelper

[GitHub] [spark] cloud-fan commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910624766 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -594,16 +605,14 @@ object JdbcUtils extends Logging with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910624766 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -594,16 +605,14 @@ object JdbcUtils extends Logging with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910624692 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -482,6 +482,17 @@ object JdbcUtils extends Logging with

[GitHub] [spark] beliefer commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
beliefer commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910612969 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

[GitHub] [spark] huaxingao commented on a diff in pull request #37025: [SPARK-39633][SQL] Fix timetravel via dataframe using timestampAsOf

2022-06-29 Thread GitBox
huaxingao commented on code in PR #37025: URL: https://github.com/apache/spark/pull/37025#discussion_r910600263 ## sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsCatalogOptionsSuite.scala: ## @@ -322,6 +323,12 @@ class SupportsCatalogOptionsSuite extends

[GitHub] [spark] LuciferYang opened a new pull request, #37030: [SPARK-39231][SQL][FOLLOWUP] Move `ColumnVectorUtils.allocateColumns` to `VectorizedParquetRecordReader`

2022-06-29 Thread GitBox
LuciferYang opened a new pull request, #37030: URL: https://github.com/apache/spark/pull/37030 ### What changes were proposed in this pull request? This pr move `ColumnVectorUtils.allocateColumns` to `VectorizedParquetRecordReader`. ### Why are the changes needed? Code

[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

2022-06-29 Thread GitBox
schuermannator commented on code in PR #36968: URL: https://github.com/apache/spark/pull/36968#discussion_r910594525 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -296,7 +300,36 @@ class CatalogImpl(sparkSession: SparkSession) extends

[GitHub] [spark] huaxingao commented on a diff in pull request #37025: [SPARK-39633][SQL] Fix timetravel via dataframe using timestampAsOf

2022-06-29 Thread GitBox
huaxingao commented on code in PR #37025: URL: https://github.com/apache/spark/pull/37025#discussion_r910592623 ## sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsCatalogOptionsSuite.scala: ## @@ -322,6 +323,12 @@ class SupportsCatalogOptionsSuite extends

[GitHub] [spark] ulysses-you commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
ulysses-you commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910587272 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala: ## @@ -17,16 +17,22 @@ package org.apache.spark.sql.catalyst +import

[GitHub] [spark] ulysses-you commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
ulysses-you commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910587033 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -512,7 +513,9 @@ class SessionCatalog( val table =

[GitHub] [spark] LuciferYang opened a new pull request, #37029: [SPARK-39638][SQL] Change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`

2022-06-29 Thread GitBox
LuciferYang opened a new pull request, #37029: URL: https://github.com/apache/spark/pull/37029 ### What changes were proposed in this pull request? Similar of SPARK-39231, this pr change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`. ###

[GitHub] [spark] LuciferYang commented on a diff in pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParq

2022-06-29 Thread GitBox
LuciferYang commented on code in PR #36616: URL: https://github.com/apache/spark/pull/36616#discussion_r910580016 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -235,4 +303,37 @@ public static ColumnarBatch toBatch(

[GitHub] [spark] LuciferYang commented on a diff in pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParq

2022-06-29 Thread GitBox
LuciferYang commented on code in PR #36616: URL: https://github.com/apache/spark/pull/36616#discussion_r910580016 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -235,4 +303,37 @@ public static ColumnarBatch toBatch(

[GitHub] [spark] singhpk234 commented on a diff in pull request #37025: [SPARK-39633][SQL] Fix timetravel via dataframe using timestampAsOf

2022-06-29 Thread GitBox
singhpk234 commented on code in PR #37025: URL: https://github.com/apache/spark/pull/37025#discussion_r910579873 ## sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsCatalogOptionsSuite.scala: ## @@ -322,6 +323,12 @@ class SupportsCatalogOptionsSuite extends

[GitHub] [spark] amaliujia commented on a diff in pull request #36983: [SPARK-39583][SQL] Make RefreshTable be compatible with 3 layer namespace

2022-06-29 Thread GitBox
amaliujia commented on code in PR #36983: URL: https://github.com/apache/spark/pull/36983#discussion_r910572886 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableAddPartitionSuite.scala: ## @@ -36,11 +36,11 @@ class AlterTableAddPartitionSuite

[GitHub] [spark] wangyum opened a new pull request, #37028: [SPARK-39637][SQL][TESTS] Involve excludedTpcdsQueries in plan related tests

2022-06-29 Thread GitBox
wangyum opened a new pull request, #37028: URL: https://github.com/apache/spark/pull/37028 ### What changes were proposed in this pull request? We excluded some TPC-DS queries in [SPARK-35327](https://issues.apache.org/jira/browse/SPARK-35327) ### Why are the changes needed?

[GitHub] [spark] beliefer commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
beliefer commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910567220 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

[GitHub] [spark] gengliangwang commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
gengliangwang commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910567076 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

[GitHub] [spark] beliefer commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
beliefer commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910565880 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

[GitHub] [spark] zhengruifeng commented on pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

2022-06-29 Thread GitBox
zhengruifeng commented on PR #36968: URL: https://github.com/apache/spark/pull/36968#issuecomment-1170687902 @schuermannator Would you mind to adding two subtasks under [umbrella](https://issues.apache.org/jira/browse/SPARK-39235) and link the the two PRs(this one and

[GitHub] [spark] LuciferYang commented on pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordR

2022-06-29 Thread GitBox
LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1170680344 Thanks @sunchao @sadikovi @dongjoon-hyun ~ I will give Orc related pr later ~ -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
LuciferYang commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1170679119 Thanks @srowen @mridulm ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
LuciferYang commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1170679006 > We have a similar pattern in SortShuffleManager - but there we wrap result in Option before foreach (given no asScala), and so is fine … did not find any other cases where this is

[GitHub] [spark] bzhaoopenstack commented on a diff in pull request #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
bzhaoopenstack commented on code in PR #37023: URL: https://github.com/apache/spark/pull/37023#discussion_r910545473 ## python/pyspark/pandas/namespace.py: ## @@ -238,7 +238,8 @@ def read_csv( path : str The path string storing the CSV file to be read. sep :

[GitHub] [spark] bzhaoopenstack commented on a diff in pull request #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
bzhaoopenstack commented on code in PR #37023: URL: https://github.com/apache/spark/pull/37023#discussion_r910545293 ## python/pyspark/pandas/namespace.py: ## @@ -238,7 +238,8 @@ def read_csv( path : str The path string storing the CSV file to be read. sep :

[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-29 Thread GitBox
tianshuang commented on PR #36741: URL: https://github.com/apache/spark/pull/36741#issuecomment-1170651968 @marmbrus , Can you give some advice? The `IsolatedClassLoader` introduced by [SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568) seven

[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #36968: URL: https://github.com/apache/spark/pull/36968#discussion_r910528924 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -296,7 +300,36 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #36968: URL: https://github.com/apache/spark/pull/36968#discussion_r910528646 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -296,7 +300,36 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] ulysses-you commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
ulysses-you commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910520404 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala: ## @@ -132,15 +132,21 @@ private[sql] object CatalogV2Implicits {

[GitHub] [spark] JoshRosen commented on a diff in pull request #37027: [SPARK-39636][CORE][UI] Fix multiple bugs in JsonProtocol, impacting off heap StorageLevels and Task/Executor ResourceRequests

2022-06-29 Thread GitBox
JoshRosen commented on code in PR #37027: URL: https://github.com/apache/spark/pull/37027#discussion_r910520108 ## core/src/main/scala/org/apache/spark/util/JsonProtocol.scala: ## @@ -512,6 +512,7 @@ private[spark] object JsonProtocol { def storageLevelToJson(storageLevel:

[GitHub] [spark] cloud-fan commented on pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
cloud-fan commented on PR #37013: URL: https://github.com/apache/spark/pull/37013#issuecomment-1170637111 > I am also curious why it fails on 1500-01-20T00:00:00.123456. Is it because of the calendar that Spark uses? I believe so, `toJavaTimestamp` takes care of the legacy calendar,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37025: [SPARK-39633][SQL] Fix timetravel via dataframe using timestampAsOf

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37025: URL: https://github.com/apache/spark/pull/37025#discussion_r910517268 ## sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsCatalogOptionsSuite.scala: ## @@ -322,6 +323,12 @@ class SupportsCatalogOptionsSuite extends

[GitHub] [spark] JoshRosen commented on a diff in pull request #37027: [SPARK-39636][CORE][UI] Fix multiple bugs in JsonProtocol, impacting off heap StorageLevels and Task/Executor ResourceRequests

2022-06-29 Thread GitBox
JoshRosen commented on code in PR #37027: URL: https://github.com/apache/spark/pull/37027#discussion_r910516312 ## core/src/main/scala/org/apache/spark/util/JsonProtocol.scala: ## @@ -750,15 +751,15 @@ private[spark] object JsonProtocol { def

[GitHub] [spark] JoshRosen commented on a diff in pull request #37027: [SPARK-39636][CORE][UI] Fix multiple bugs in JsonProtocol, impacting off heap StorageLevels and Task/Executor ResourceRequests

2022-06-29 Thread GitBox
JoshRosen commented on code in PR #37027: URL: https://github.com/apache/spark/pull/37027#discussion_r910516086 ## core/src/main/scala/org/apache/spark/util/JsonProtocol.scala: ## @@ -750,15 +751,15 @@ private[spark] object JsonProtocol { def

[GitHub] [spark] JoshRosen commented on pull request #36885: [WIP][SPARK-39489][CORE] Improve event logging JsonProtocol performance by using Jackson instead of Json4s

2022-06-29 Thread GitBox
JoshRosen commented on PR #36885: URL: https://github.com/apache/spark/pull/36885#issuecomment-1170630700 I submitted https://github.com/apache/spark/pull/37027 to fix the [pre-existing JsonProtocol bugs that I found during my

[GitHub] [spark] JoshRosen opened a new pull request, #37027: [SPARK-39636][CORE][UI] Fix multiple bugs in JsonProtocol, impacting off heap StorageLevels and Task/Executor ResourceRequests

2022-06-29 Thread GitBox
JoshRosen opened a new pull request, #37027: URL: https://github.com/apache/spark/pull/37027 ### What changes were proposed in this pull request? This PR fixes three longstanding bugs in Spark's `JsonProtocol`: - `TaskResourceRequest` loses precision for `amount` < 0.5. The

[GitHub] [spark] github-actions[bot] closed pull request #35902: [SPARK-2489][SQL] Support Parquet's optional fixed_len_byte_array

2022-06-29 Thread GitBox
github-actions[bot] closed pull request #35902: [SPARK-2489][SQL] Support Parquet's optional fixed_len_byte_array URL: https://github.com/apache/spark/pull/35902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
gengliangwang commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910491806 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1943,13 +1943,17 @@ class JDBCSuite extends QueryTest .option("dbtable",

[GitHub] [spark] gengliangwang commented on pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
gengliangwang commented on PR #37013: URL: https://github.com/apache/spark/pull/37013#issuecomment-1170609100 > This PR just modify a test case and it will be failed ! The test case output failure show below. @beliefer could you provide the test case itself in the PR description?

[GitHub] [spark] gengliangwang commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
gengliangwang commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910500442 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

[GitHub] [spark] gengliangwang commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
gengliangwang commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910491806 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1943,13 +1943,17 @@ class JDBCSuite extends QueryTest .option("dbtable",

[GitHub] [spark] srowen commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
srowen commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1170580673 Merged to master/3.3/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen closed pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
srowen closed pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13 URL: https://github.com/apache/spark/pull/37024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] gengliangwang commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
gengliangwang commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r910474903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

[GitHub] [spark] sadikovi commented on a diff in pull request #36632: [SPARK-35378][SQL][FOLLOW-UP] Fix incorrect return type in CommandResultExec.executeCollect()

2022-06-29 Thread GitBox
sadikovi commented on code in PR #36632: URL: https://github.com/apache/spark/pull/36632#discussion_r910426378 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -262,6 +263,14 @@ class QueryExecutionSuite extends SharedSparkSession {

[GitHub] [spark] mridulm commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
mridulm commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1170452706 We have a similar pattern in SortShuffleManager - there we wrap result in Option before foreach … -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-29 Thread GitBox
mridulm commented on PR #36162: URL: https://github.com/apache/spark/pull/36162#issuecomment-1170429727 Thanks for working on this @weixiuli ! This should really help with speculative execution. Thanks for merging it @Ngone51 :-) Traveling and don’t have access to my desktop to help

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
bjornjorgensen commented on code in PR #37023: URL: https://github.com/apache/spark/pull/37023#discussion_r910245306 ## python/pyspark/pandas/namespace.py: ## @@ -238,7 +238,8 @@ def read_csv( path : str The path string storing the CSV file to be read. sep :

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r910236213 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -39,6 +46,75 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r910232604 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -917,6 +994,49 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r910232200 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -917,6 +994,49 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] tardunge commented on pull request #36717: [SPARK-33274][SS] Stop query in cp mode when total cores less than total kafka partition

2022-06-29 Thread GitBox
tardunge commented on PR #36717: URL: https://github.com/apache/spark/pull/36717#issuecomment-1170267023 sorry, didn't mean to approve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2022-06-29 Thread GitBox
holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1170257079 Interesting, can you send me the executor log? I want to see if the signal handler is failing to register or something. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910211300 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala: ## @@ -132,15 +132,21 @@ private[sql] object CatalogV2Implicits {

[GitHub] [spark] cloud-fan commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910210221 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1450,6 +1453,7 @@ class SessionCatalog( * Constructs a

[GitHub] [spark] cloud-fan commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910209536 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -512,7 +513,9 @@ class SessionCatalog( val table =

[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-29 Thread GitBox
tianshuang commented on PR #36741: URL: https://github.com/apache/spark/pull/36741#issuecomment-1170246184 As described at the beginning of this PR, I tried other fixes and none of them worked, and finally I came up with the current fix, which does seem hacky since I also didn't come up

[GitHub] [spark] cloud-fan commented on a diff in pull request #37021: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #37021: URL: https://github.com/apache/spark/pull/37021#discussion_r910205203 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala: ## @@ -17,16 +17,22 @@ package org.apache.spark.sql.catalyst +import

[GitHub] [spark] cloud-fan commented on pull request #36983: [SPARK-39583][SQL] Make RefreshTable be compatible with 3 layer namespace

2022-06-29 Thread GitBox
cloud-fan commented on PR #36983: URL: https://github.com/apache/spark/pull/36983#issuecomment-1170238800 can you fix the failed tests? ``` [error] Failed tests: [error] org.apache.spark.sql.hive.execution.command.AlterTableRenamePartitionSuite [error]

[GitHub] [spark] cloud-fan commented on a diff in pull request #36632: [SPARK-35378][SQL][FOLLOW-UP] Fix incorrect return type in CommandResultExec.executeCollect()

2022-06-29 Thread GitBox
cloud-fan commented on code in PR #36632: URL: https://github.com/apache/spark/pull/36632#discussion_r910195841 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -262,6 +263,14 @@ class QueryExecutionSuite extends SharedSparkSession {

[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-29 Thread GitBox
tianshuang commented on PR #36741: URL: https://github.com/apache/spark/pull/36741#issuecomment-1170231007 > This seems super hacky. Is there any way to do better cleanup? Yes, the current fix ensures that we always manipulate `threadLocalMS` instances in `HMSHandler`(loaded by

[GitHub] [spark] srowen commented on a diff in pull request #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
srowen commented on code in PR #37023: URL: https://github.com/apache/spark/pull/37023#discussion_r910181717 ## python/pyspark/pandas/namespace.py: ## @@ -238,7 +238,8 @@ def read_csv( path : str The path string storing the CSV file to be read. sep : str,

[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-29 Thread GitBox
tianshuang commented on PR #36741: URL: https://github.com/apache/spark/pull/36741#issuecomment-1170214998 > Can't we just use a SoftReferenceMap or something ? is the leak just holding onto entries in that Map? In fact, the leak

[GitHub] [spark] sunchao commented on pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReade

2022-06-29 Thread GitBox
sunchao commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1170203151 Committed to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sunchao closed pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-06-29 Thread GitBox
sunchao closed pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReader` URL: https://github.com/apache/spark/pull/36616 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] bjornjorgensen commented on pull request #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
bjornjorgensen commented on PR #37023: URL: https://github.com/apache/spark/pull/37023#issuecomment-1170199790 _You're sure a multi-character string works here?_ @srowen it does. DurationtestPulsetestMaxpulsetestCalories 60test110test130test409.1 60test117test145test479.0

[GitHub] [spark] srowen commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-29 Thread GitBox
srowen commented on PR #36741: URL: https://github.com/apache/spark/pull/36741#issuecomment-1170187617 Can't we just use a SoftReferenceMap or something ? is the leak just holding onto entries in that Map? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] matar993 opened a new pull request, #37026: [SPARK-39632][SQL][Tests] Add state utils to StreamTest to check states during streaming queries

2022-06-29 Thread GitBox
matar993 opened a new pull request, #37026: URL: https://github.com/apache/spark/pull/37026 ### What changes were proposed in this pull request? This PR aims to allow the user to check the state during a streaming query execution. It is possible to check how the state is updated on

[GitHub] [spark] singhpk234 commented on pull request #37025: [SPARK-39633][SQL] Fix timetravel via dataframe using timestampAsOf

2022-06-29 Thread GitBox
singhpk234 commented on PR #37025: URL: https://github.com/apache/spark/pull/37025#issuecomment-1170101448 cc @cloud-fan @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] singhpk234 opened a new pull request, #37025: [SPARK-39633][SQL] Fix timetravel via dataframe using timestampAsOf

2022-06-29 Thread GitBox
singhpk234 opened a new pull request, #37025: URL: https://github.com/apache/spark/pull/37025 ### What changes were proposed in this pull request? When specifying the expressions to TimeTravelSpec we should cast the only integer format timestamp i.e `1656505650` to long

[GitHub] [spark] tgravescs commented on pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-29 Thread GitBox
tgravescs commented on PR #36716: URL: https://github.com/apache/spark/pull/36716#issuecomment-1170021683 No lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk commented on a diff in pull request #36632: [SPARK-35378][SQL][FOLLOW-UP] Fix incorrect return type in CommandResultExec.executeCollect()

2022-06-29 Thread GitBox
MaxGekk commented on code in PR #36632: URL: https://github.com/apache/spark/pull/36632#discussion_r909637351 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -262,6 +263,14 @@ class QueryExecutionSuite extends SharedSparkSession {

[GitHub] [spark] ulysses-you commented on a diff in pull request #36936: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
ulysses-you commented on code in PR #36936: URL: https://github.com/apache/spark/pull/36936#discussion_r909617245 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala: ## @@ -33,15 +36,34 @@ sealed trait IdentifierWithDatabase { */ private def

[GitHub] [spark] ulysses-you commented on a diff in pull request #36936: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
ulysses-you commented on code in PR #36936: URL: https://github.com/apache/spark/pull/36936#discussion_r909614603 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -895,7 +905,7 @@ case class ShowTablesCommand( val normalizedSpec =

[GitHub] [spark] LuciferYang commented on pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordR

2022-06-29 Thread GitBox
LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1169962497 > LGTM, pending CI GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ulysses-you commented on pull request #36936: [SPARK-39503][SQL] Add session catalog name for v1 database table and function

2022-06-29 Thread GitBox
ulysses-you commented on PR #36936: URL: https://github.com/apache/spark/pull/36936#issuecomment-1169961468 I create a new pr https://github.com/apache/spark/pull/37021 for this issue but a new approach: - add catalog field in identifier, so identifier just print catalog if defined -

[GitHub] [spark] srowen commented on a diff in pull request #36937: [SPARK-39539][SQL] millisToMicros overflow

2022-06-29 Thread GitBox
srowen commented on code in PR #36937: URL: https://github.com/apache/spark/pull/36937#discussion_r909605318 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala: ## @@ -237,7 +237,13 @@ object DateTimeUtils { * Converts milliseconds since

[GitHub] [spark] srowen commented on a diff in pull request #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
srowen commented on code in PR #37023: URL: https://github.com/apache/spark/pull/37023#discussion_r909603755 ## python/pyspark/pandas/namespace.py: ## @@ -238,7 +238,8 @@ def read_csv( path : str The path string storing the CSV file to be read. sep : str,

[GitHub] [spark] srowen commented on pull request #37016: Driver cores mult be a positive number fix

2022-06-29 Thread GitBox
srowen commented on PR #37016: URL: https://github.com/apache/spark/pull/37016#issuecomment-1169939595 Also, you opened this vs 3.2, but needs to be vs master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] srowen commented on pull request #37016: Driver cores mult be a positive number fix

2022-06-29 Thread GitBox
srowen commented on PR #37016: URL: https://github.com/apache/spark/pull/37016#issuecomment-1169938900 This needs to be connected by putting [SPARK-39617] in the title along with component -- see the contributing guide -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] beliefer commented on pull request #37001: [WIP][SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT

2022-06-29 Thread GitBox
beliefer commented on PR #37001: URL: https://github.com/apache/spark/pull/37001#issuecomment-1169899554 ping @huaxingao cc @cloud-fan This PR is a temp implement. Do you have better idea ? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] LuciferYang commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
LuciferYang commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1169785962 Scala 2.13.9 should resolved this issue, but the release time of 2.13.9 is uncertain https://github.com/scala/scala/milestones

[GitHub] [spark] LuciferYang commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
LuciferYang commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1169779618 Is it necessary to do this protection work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
LuciferYang commented on PR #37024: URL: https://github.com/apache/spark/pull/37024#issuecomment-1169772725 cc @HyukjinKwon @srowen @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang opened a new pull request, #37024: [SPARK-39553][CORE] Multi-thread unregister shuffle shouldn't throw NPE when using Scala 2.13

2022-06-29 Thread GitBox
LuciferYang opened a new pull request, #37024: URL: https://github.com/apache/spark/pull/37024 ### What changes were proposed in this pull request? This pr add a `shuffleStatus != null` condition to `o.a.s.MapOutputTrackerMaster#unregisterShuffle` method to avoid throwing NPE when using

[GitHub] [spark] zhengruifeng commented on pull request #37019: [SPARK-39446][MLLIB][FOLLOWUP] Modify ranking metrics for java and python

2022-06-29 Thread GitBox
zhengruifeng commented on PR #37019: URL: https://github.com/apache/spark/pull/37019#issuecomment-1169761009 merged to master, thank you @uch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng closed pull request #37019: [SPARK-39446][MLLIB][FOLLOWUP] Modify ranking metrics for java and python

2022-06-29 Thread GitBox
zhengruifeng closed pull request #37019: [SPARK-39446][MLLIB][FOLLOWUP] Modify ranking metrics for java and python URL: https://github.com/apache/spark/pull/37019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] yeachan153 commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2022-06-29 Thread GitBox
yeachan153 commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1169757864 > @yeachan153 interesting what platform are you running on? We're running the 3.2.0 dist on Kubernetes based on the openjdk:11 image -- This is an automated message from the

[GitHub] [spark] yeachan153 commented on a diff in pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2022-06-29 Thread GitBox
yeachan153 commented on code in PR #36434: URL: https://github.com/apache/spark/pull/36434#discussion_r909413403 ## resource-managers/kubernetes/docker/src/main/dockerfiles/spark/decom.sh: ## @@ -18,17 +18,24 @@ # -set -ex +set +e +set -x echo "Asked to decommission" #

[GitHub] [spark] weixiuli commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-29 Thread GitBox
weixiuli commented on PR #36162: URL: https://github.com/apache/spark/pull/36162#issuecomment-1169713010 Thanks for your review @Ngone51 @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] bzhaoopenstack opened a new pull request, #37023: [TYPO-FIX] Make the 'sep' description better in read_csv of pyspark p…

2022-06-29 Thread GitBox
bzhaoopenstack opened a new pull request, #37023: URL: https://github.com/apache/spark/pull/37023 The 'sep' parameter supports a seperated string, which length is larger than 1. So it doesn't only support a single character, but also a string contains multiple characters. -- This is

[GitHub] [spark] Ngone51 commented on pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-29 Thread GitBox
Ngone51 commented on PR #36716: URL: https://github.com/apache/spark/pull/36716#issuecomment-1169628455 @tgravescs do you have more concerns? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] mihaivinaga commented on pull request #37016: Driver cores mult be a positive number fix

2022-06-29 Thread GitBox
mihaivinaga commented on PR #37016: URL: https://github.com/apache/spark/pull/37016#issuecomment-1169622311 @srowen the issue is created and posted in the initial post: https://issues.apache.org/jira/browse/SPARK-39617 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordR

2022-06-29 Thread GitBox
LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1169587117 > LGTM, pending CI Thanks !!! @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParq

2022-06-29 Thread GitBox
LuciferYang commented on code in PR #36616: URL: https://github.com/apache/spark/pull/36616#discussion_r909245615 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -235,4 +291,37 @@ public static ColumnarBatch toBatch(

[GitHub] [spark] sunchao commented on a diff in pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetR

2022-06-29 Thread GitBox
sunchao commented on code in PR #36616: URL: https://github.com/apache/spark/pull/36616#discussion_r909243646 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -235,4 +291,37 @@ public static ColumnarBatch toBatch(

[GitHub] [spark] sadikovi commented on pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
sadikovi commented on PR #37013: URL: https://github.com/apache/spark/pull/37013#issuecomment-1169579384 Please update the PR description with the clear explanation of the bug and how the solution fixes the problem. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] sadikovi commented on a diff in pull request #37013: [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect

2022-06-29 Thread GitBox
sadikovi commented on code in PR #37013: URL: https://github.com/apache/spark/pull/37013#discussion_r909238043 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -599,10 +610,13 @@ object JdbcUtils extends Logging with

  1   2   >