[jira] [Commented] (SPARK-39386) Flaky Test: BloomFilterAggregateQuerySuite

2022-06-05 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550308#comment-17550308
 ] 

Dongjoon Hyun commented on SPARK-39386:
---

Currently, I set this as a normal flaky test issue instead of `Blocker`.

> Flaky Test: BloomFilterAggregateQuerySuite
> --
>
> Key: SPARK-39386
> URL: https://issues.apache.org/jira/browse/SPARK-39386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> During Apache Spark 3.3.0 RC5 tests, I found that this test case is very 
> flaky in my environment.
> {code:java}
>  [info] - Test bloom_filter_agg and might_contain *** FAILED *** (20 
> seconds, 370 milliseconds)
>  [info]   Results do not match for query:
>  [info]   Timezone: 
> sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-2880,dstSavings=360,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-2880,dstSavings=360,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=720,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=720,endTimeMode=0]]
>  [info]   Timezone Env: 
> ...
>   == Results ==
>  [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
>  [info]   !struct<>   
> struct
>  [info]   ![true,false]   [true,true] (QueryTest.scala:244)
>  [info]   org.scalatest.exceptions.TestFailedException:
>  [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>  [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>  [info]   at 
> org.apache.spark.sql.QueryTest$.newAssertionFailedException(QueryTest.scala:234)
>  [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
>  [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
>  [info]   at org.apache.spark.sql.QueryTest$.fail(QueryTest.scala:234)
>  [info]   at 
> org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:244)
>  [info]   at 
> org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:151)
>  [info]   at 
> org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:155)
>  [info]   at 
> org.apache.spark.sql.BloomFilterAggregateQuerySuite.$anonfun$new$4(BloomFilterAggregateQuerySuite.scala:98)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39387) Upgrade hive-storage-api to 2.7.3

2022-06-05 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-39387:
---
Description: 
HIVE-25190: Fix many small allocations in BytesColumnVector

 
{code:java}
Caused by: java.lang.RuntimeException: Overflow of newLength. 
smallBuffer.length=1073741824, nextElemLength=408101
at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(BytesColumnVector.java:311)
at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.setColumn(WriterImpl.java:179)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.setColumn(WriterImpl.java:268)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.setColumn(WriterImpl.java:223)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:294)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:105)
at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:157)
at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:176)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:86)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:93)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:312)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1534)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:319)
 {code}

  was:[HIVE-25190|https://issues.apache.org/jira/browse/HIVE-25190]: Fix many 
small allocations in BytesColumnVector


> Upgrade hive-storage-api to 2.7.3
> -
>
> Key: SPARK-39387
> URL: https://issues.apache.org/jira/browse/SPARK-39387
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> HIVE-25190: Fix many small allocations in BytesColumnVector
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Overflow of newLength. 
> smallBuffer.length=1073741824, nextElemLength=408101
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(BytesColumnVector.java:311)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.setColumn(WriterImpl.java:179)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.setColumn(WriterImpl.java:268)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.setColumn(WriterImpl.java:223)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:294)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:157)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:176)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:93)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:312)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1534)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:319)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39387) Upgrade hive-storage-api to 2.7.3

2022-06-05 Thread dzcxzl (Jira)
dzcxzl created SPARK-39387:
--

 Summary: Upgrade hive-storage-api to 2.7.3
 Key: SPARK-39387
 URL: https://issues.apache.org/jira/browse/SPARK-39387
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.1
Reporter: dzcxzl


[HIVE-25190|https://issues.apache.org/jira/browse/HIVE-25190]: Fix many small 
allocations in BytesColumnVector



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39386) Flaky Test: BloomFilterAggregateQuerySuite

2022-06-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39386:
--
Description: 
During Apache Spark 3.3.0 RC5 tests, I found that this test case is very flaky 
in my environment.
{code:java}
 [info] - Test bloom_filter_agg and might_contain *** FAILED *** (20 
seconds, 370 milliseconds)
 [info]   Results do not match for query:
 [info]   Timezone: 
sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-2880,dstSavings=360,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-2880,dstSavings=360,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=720,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=720,endTimeMode=0]]
 [info]   Timezone Env: 
...

  == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   !struct<>   
struct
 [info]   ![true,false]   [true,true] (QueryTest.scala:244)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
 [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
 [info]   at 
org.apache.spark.sql.QueryTest$.newAssertionFailedException(QueryTest.scala:234)
 [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
 [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
 [info]   at org.apache.spark.sql.QueryTest$.fail(QueryTest.scala:234)
 [info]   at 
org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:244)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:151)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:155)
 [info]   at 
org.apache.spark.sql.BloomFilterAggregateQuerySuite.$anonfun$new$4(BloomFilterAggregateQuerySuite.scala:98)
{code}

  was:
{code}
 [info] - Test bloom_filter_agg and might_contain *** FAILED *** (20 
seconds, 370 milliseconds)
 [info]   Results do not match for query:
 [info]   Timezone: 
sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-2880,dstSavings=360,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-2880,dstSavings=360,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=720,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=720,endTimeMode=0]]
 [info]   Timezone Env: 
...

  == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   !struct<>   
struct
 [info]   ![true,false]   [true,true] (QueryTest.scala:244)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
 [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
 [info]   at 
org.apache.spark.sql.QueryTest$.newAssertionFailedException(QueryTest.scala:234)
 [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
 [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
 [info]   at org.apache.spark.sql.QueryTest$.fail(QueryTest.scala:234)
 [info]   at 
org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:244)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:151)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:155)
 [info]   at 
org.apache.spark.sql.BloomFilterAggregateQuerySuite.$anonfun$new$4(BloomFilterAggregateQuerySuite.scala:98)
{code}


> Flaky Test: BloomFilterAggregateQuerySuite
> --
>
> Key: SPARK-39386
> URL: https://issues.apache.org/jira/browse/SPARK-39386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> During Apache Spark 3.3.0 RC5 tests, I found that this test case is very 
> flaky in my environment.
> {code:java}
>  [info] - Test bloom_filter_agg and might_contain *** FAILED *** (20 
> seconds, 370 milliseconds)
>  [info]   Results do not match for query:
>  [info]   Timezone: 
> sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-2880,dstSavings=360,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-2880,dstSavings=360,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=720,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=720,endTimeMode=0]]
>  [info]   Timezone Env: 
> ...
>   

[jira] [Created] (SPARK-39386) Flaky Test: BloomFilterAggregateQuerySuite

2022-06-05 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39386:
-

 Summary: Flaky Test: BloomFilterAggregateQuerySuite
 Key: SPARK-39386
 URL: https://issues.apache.org/jira/browse/SPARK-39386
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun


{code}
 [info] - Test bloom_filter_agg and might_contain *** FAILED *** (20 
seconds, 370 milliseconds)
 [info]   Results do not match for query:
 [info]   Timezone: 
sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-2880,dstSavings=360,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-2880,dstSavings=360,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=720,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=720,endTimeMode=0]]
 [info]   Timezone Env: 
...

  == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   !struct<>   
struct
 [info]   ![true,false]   [true,true] (QueryTest.scala:244)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
 [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
 [info]   at 
org.apache.spark.sql.QueryTest$.newAssertionFailedException(QueryTest.scala:234)
 [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
 [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
 [info]   at org.apache.spark.sql.QueryTest$.fail(QueryTest.scala:234)
 [info]   at 
org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:244)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:151)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:155)
 [info]   at 
org.apache.spark.sql.BloomFilterAggregateQuerySuite.$anonfun$new$4(BloomFilterAggregateQuerySuite.scala:98)
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34631) Caught Hive MetaException when query by partition (partition col start with ‘$’)

2022-06-05 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550298#comment-17550298
 ] 

Yuming Wang commented on SPARK-34631:
-

Please backport SPARK-36137 to workaround this issue.

> Caught Hive MetaException when query by partition (partition col start with 
> ‘$’)
> 
>
> Key: SPARK-34631
> URL: https://issues.apache.org/jira/browse/SPARK-34631
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Java API
>Affects Versions: 2.4.4
>Reporter: zhouyuan
>Priority: Critical
>
> create a table, set location as parquet, do msck repair table to get the data.
> But when query with partition column, got some errors (adding backtick would 
> not help)
> {code:java}
> // code placeholder
> {code}
> select count from some_table where `$partition_date` = '2015-01-01'
>  
> {panel:title=error:}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:677)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1221)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1214)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1214)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:254)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:962)
>  at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions$lzycompute(HiveTableScanExec.scala:174)
>  at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions(HiveTableScanExec.scala:166)
>  at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:192)
>  at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:192)
>  at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2470)
>  at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:191)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
>  at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
>  at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
>  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
>  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
>  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> 

[jira] [Assigned] (SPARK-39377) Normalize expr ids in ListQuery and Exists expressions

2022-06-05 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-39377:
---

Assignee: Yuming Wang

> Normalize expr ids in ListQuery and Exists expressions
> --
>
> Key: SPARK-39377
> URL: https://issues.apache.org/jira/browse/SPARK-39377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39377) Normalize expr ids in ListQuery and Exists expressions

2022-06-05 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-39377.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36764
[https://github.com/apache/spark/pull/36764]

> Normalize expr ids in ListQuery and Exists expressions
> --
>
> Key: SPARK-39377
> URL: https://issues.apache.org/jira/browse/SPARK-39377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39385) Translate linear regression aggregate functions for pushdown

2022-06-05 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-39385:
--

 Summary: Translate linear regression aggregate functions for 
pushdown
 Key: SPARK-39385
 URL: https://issues.apache.org/jira/browse/SPARK-39385
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39384) Compile linear regression aggregate functions of build-in JDBC dialect

2022-06-05 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-39384:
--

 Summary: Compile linear regression aggregate functions of build-in 
JDBC dialect
 Key: SPARK-39384
 URL: https://issues.apache.org/jira/browse/SPARK-39384
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39065) DS V2 Limit push-down should avoid out of memory

2022-06-05 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng resolved SPARK-39065.

Resolution: Won't Fix

We can't avoid OOM.

> DS V2 Limit push-down should avoid out of memory
> 
>
> Key: SPARK-39065
> URL: https://issues.apache.org/jira/browse/SPARK-39065
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark DS V2 supports push down Limit operator to data source.
> But the behavior only controlled by pushDownList option.
> If the limit is very large, then Executor will pull all the result set from 
> data source.
> So it will cause the memory issue as you know.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39179) Improve the test coverage for pyspark/shuffle.py

2022-06-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39179.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36701
[https://github.com/apache/spark/pull/36701]

> Improve the test coverage for pyspark/shuffle.py
> 
>
> Key: SPARK-39179
> URL: https://issues.apache.org/jira/browse/SPARK-39179
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: pralabhkumar
>Assignee: pralabhkumar
>Priority: Minor
> Fix For: 3.4.0
>
>
> Improve the test coverage of shuffle.py



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39383) Support V2 data sources with DEFAULT values

2022-06-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39383:


Assignee: (was: Apache Spark)

> Support V2 data sources with DEFAULT values
> ---
>
> Key: SPARK-39383
> URL: https://issues.apache.org/jira/browse/SPARK-39383
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39383) Support V2 data sources with DEFAULT values

2022-06-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550252#comment-17550252
 ] 

Apache Spark commented on SPARK-39383:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/36771

> Support V2 data sources with DEFAULT values
> ---
>
> Key: SPARK-39383
> URL: https://issues.apache.org/jira/browse/SPARK-39383
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39383) Support V2 data sources with DEFAULT values

2022-06-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39383:


Assignee: Apache Spark

> Support V2 data sources with DEFAULT values
> ---
>
> Key: SPARK-39383
> URL: https://issues.apache.org/jira/browse/SPARK-39383
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39383) Support V2 data sources with DEFAULT values

2022-06-05 Thread Daniel (Jira)
Daniel created SPARK-39383:
--

 Summary: Support V2 data sources with DEFAULT values
 Key: SPARK-39383
 URL: https://issues.apache.org/jira/browse/SPARK-39383
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39382) UI show the duartion of the failed task when the executor lost

2022-06-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550183#comment-17550183
 ] 

Apache Spark commented on SPARK-39382:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/36770

> UI show the duartion of the failed task when the executor lost
> --
>
> Key: SPARK-39382
> URL: https://issues.apache.org/jira/browse/SPARK-39382
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Trivial
>
> When the executor is lost due to OOM or other reasons, the metrics of these 
> failed tasks do not have executorRunTime, which results in that the duration 
> cannot be displayed in the UI.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39382) UI show the duartion of the failed task when the executor lost

2022-06-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550182#comment-17550182
 ] 

Apache Spark commented on SPARK-39382:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/36770

> UI show the duartion of the failed task when the executor lost
> --
>
> Key: SPARK-39382
> URL: https://issues.apache.org/jira/browse/SPARK-39382
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Trivial
>
> When the executor is lost due to OOM or other reasons, the metrics of these 
> failed tasks do not have executorRunTime, which results in that the duration 
> cannot be displayed in the UI.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39382) UI show the duartion of the failed task when the executor lost

2022-06-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39382:


Assignee: Apache Spark

> UI show the duartion of the failed task when the executor lost
> --
>
> Key: SPARK-39382
> URL: https://issues.apache.org/jira/browse/SPARK-39382
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
> When the executor is lost due to OOM or other reasons, the metrics of these 
> failed tasks do not have executorRunTime, which results in that the duration 
> cannot be displayed in the UI.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39382) UI show the duartion of the failed task when the executor lost

2022-06-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39382:


Assignee: (was: Apache Spark)

> UI show the duartion of the failed task when the executor lost
> --
>
> Key: SPARK-39382
> URL: https://issues.apache.org/jira/browse/SPARK-39382
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Trivial
>
> When the executor is lost due to OOM or other reasons, the metrics of these 
> failed tasks do not have executorRunTime, which results in that the duration 
> cannot be displayed in the UI.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39382) UI show the duartion of the failed task when the executor lost

2022-06-05 Thread dzcxzl (Jira)
dzcxzl created SPARK-39382:
--

 Summary: UI show the duartion of the failed task when the executor 
lost
 Key: SPARK-39382
 URL: https://issues.apache.org/jira/browse/SPARK-39382
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: dzcxzl


When the executor is lost due to OOM or other reasons, the metrics of these 
failed tasks do not have executorRunTime, which results in that the duration 
cannot be displayed in the UI.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39374) Improve error message for user specified column list

2022-06-05 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-39374.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36760
[https://github.com/apache/spark/pull/36760]

> Improve error message for user specified column list
> 
>
> Key: SPARK-39374
> URL: https://issues.apache.org/jira/browse/SPARK-39374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> {noformat}
> scala> spark.sql("create table t1(id int, name string) using parquet")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("insert into t1(id, nn) values(1, 'name')")
> org.apache.spark.sql.AnalysisException: Cannot resolve column name nn; line 1 
> pos 0
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.cannotResolveUserSpecifiedColumnsError(QueryCompilationErrors.scala:406)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns$.$anonfun$resolveUserSpecifiedColumns$2(Analyzer.scala:3406)
>   at scala.Option.getOrElse(Option.scala:189)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39374) Improve error message for user specified column list

2022-06-05 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-39374:


Assignee: Yuming Wang

> Improve error message for user specified column list
> 
>
> Key: SPARK-39374
> URL: https://issues.apache.org/jira/browse/SPARK-39374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> {noformat}
> scala> spark.sql("create table t1(id int, name string) using parquet")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("insert into t1(id, nn) values(1, 'name')")
> org.apache.spark.sql.AnalysisException: Cannot resolve column name nn; line 1 
> pos 0
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.cannotResolveUserSpecifiedColumnsError(QueryCompilationErrors.scala:406)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns$.$anonfun$resolveUserSpecifiedColumns$2(Analyzer.scala:3406)
>   at scala.Option.getOrElse(Option.scala:189)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39381) Make vectorized orc columar writer batch size configurable

2022-06-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39381:


Assignee: Apache Spark

> Make vectorized orc columar writer batch size configurable
> --
>
> Key: SPARK-39381
> URL: https://issues.apache.org/jira/browse/SPARK-39381
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Minor
>
> Now vectorized columar orc writer batch size is default 1024.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39381) Make vectorized orc columar writer batch size configurable

2022-06-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39381:


Assignee: (was: Apache Spark)

> Make vectorized orc columar writer batch size configurable
> --
>
> Key: SPARK-39381
> URL: https://issues.apache.org/jira/browse/SPARK-39381
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now vectorized columar orc writer batch size is default 1024.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39381) Make vectorized orc columar writer batch size configurable

2022-06-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550176#comment-17550176
 ] 

Apache Spark commented on SPARK-39381:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/36769

> Make vectorized orc columar writer batch size configurable
> --
>
> Key: SPARK-39381
> URL: https://issues.apache.org/jira/browse/SPARK-39381
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now vectorized columar orc writer batch size is default 1024.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39381) Make vectorized orc columar writer batch size configurable

2022-06-05 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-39381:
---
Description: Now vectorized columar orc writer batch size is default 1024.  
(was: Now vectorized columar orc writer batch size is default 1024)

> Make vectorized orc columar writer batch size configurable
> --
>
> Key: SPARK-39381
> URL: https://issues.apache.org/jira/browse/SPARK-39381
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: dzcxzl
>Priority: Minor
>
> Now vectorized columar orc writer batch size is default 1024.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39381) Make vectorized orc columar writer batch size configurable

2022-06-05 Thread dzcxzl (Jira)
dzcxzl created SPARK-39381:
--

 Summary: Make vectorized orc columar writer batch size configurable
 Key: SPARK-39381
 URL: https://issues.apache.org/jira/browse/SPARK-39381
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1
Reporter: dzcxzl


Now vectorized columar orc writer batch size is default 1024



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org