date:20210104

[jira] [Updated] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33894:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Assignee: koert kuipers
>Priority: Major
> Fix For: 3.1.1
>
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
>

[jira] [Updated] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33980:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34000:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.0.2, 3.1.1
>
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258705#comment-17258705
 ] 

L. C. Hsieh commented on SPARK-33833:
-

I think SS allows users to specify custom group id, isn't it?

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33979) Filter predicate reorder

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33979:


Assignee: (was: Apache Spark)

> Filter predicate reorder
> 
>
> Key: SPARK-33979
> URL: https://issues.apache.org/jira/browse/SPARK-33979
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Reorder filter predicate to improve query performance:
> {noformat}
> others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33979) Filter predicate reorder

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33979:


Assignee: Apache Spark

> Filter predicate reorder
> 
>
> Key: SPARK-33979
> URL: https://issues.apache.org/jira/browse/SPARK-33979
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Reorder filter predicate to improve query performance:
> {noformat}
> others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-34000:
---
Affects Version/s: (was: 3.0.1)

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258695#comment-17258695
 ] 

Jungtaek Lim commented on SPARK-33833:
--

For SS, consumer group is randomly generated by intention, which is the actual 
issue on leveraging the offset information with Kafka ecosystem.

SPARK-27549 was the thing to address this, but that was unfortunately 
soft-rejected to have in Spark repository. Instead of pushing this more, I've 
just crafted the project on my repository -  
https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33100) Support parse the sql statements with c-style comments

2021-01-04 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33100.
--
Fix Version/s: 3.2.0
   3.1.0
 Assignee: feiwang  (was: Apache Spark)
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/29982

> Support parse the sql statements with c-style comments
> --
>
> Key: SPARK-33100
> URL: https://issues.apache.org/jira/browse/SPARK-33100
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: feiwang
>Assignee: feiwang
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
>
> Now the spark-sql does not support parse the sql statements with C-style 
> comments.
> For the sql statements:
> {code:java}
> /* SELECT 'test'; */
> SELECT 'test';
> {code}
> Would be split to two statements:
> The first: "/* SELECT 'test'"
> The second: "*/ SELECT 'test'"
> Then it would throw an exception because the first one is illegal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-34003:


 Summary: Rule conflicts between 
PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions
 Key: SPARK-34003
 URL: https://issues.apache.org/jira/browse/SPARK-34003
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao


ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` to 
generate a `resolved agg` to determine which unresolved sort attribute should 
be pushed into the agg. However, after we add the 
PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
thus, the `resolved agg` cannot match original attributes anymore. 

It causes some dissociative sort attribute to be pushed in and fails the query


{code:java}
[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
aggregate function. Add to group by or wrap in first() (or first_value) if you 
don't care which value you get.;
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
sum(i)#11L, v#13 AS aggOrder#12]
[info] +- SubqueryAlias testcat.t1
[info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length 
, cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as 
string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]   +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
[info]
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
sum(i)#11L, v#13 AS aggOrder#12]
[info] +- SubqueryAlias testcat.t1
[info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length 
, cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as 
string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]   +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258716#comment-17258716
 ] 

L. C. Hsieh commented on SPARK-33833:
-

I read though the comments in the previous PR. The approach is pretty similar 
as what I did locally. So I guess that if nothing changes, it won't be 
considered too in the Spark codebase.



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34007:
-
Target Version/s: 3.1.0

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258640#comment-17258640
 ] 

Apache Spark commented on SPARK-34000:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/31025

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34000:


Assignee: Apache Spark

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Apache Spark
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33950:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258670#comment-17258670
 ] 

Dongjoon Hyun edited comment on SPARK-31786 at 1/5/21, 5:24 AM:


Yes, you are correct.
 # `export` is only required for your machine.
 # `–conf` should be used for `driverEnv`.

Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better 
because of SPARK-33005 (`Kubernetes GA Preparation`). FYI, Apache Spark 3.1.0 
RC1 is already created.
 - [https://github.com/apache/spark/tree/v3.1.0-rc1]

Apache Spark 3.1.0 will arrive this month.


was (Author: dongjoon):
Yes, you are correct.
 # `export` is only required for your machine.
 # `–conf` should be used for `driverEnv`.

Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better 
because of SPARK-33005 . FYI, Apache Spark 3.1.0 RC1 is already created.

- https://github.com/apache/spark/tree/v3.1.0-rc1

Apache Spark 3.1.0 will arrive this month.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at

[jira] [Updated] (SPARK-32085) Migrate to NumPy documentation style

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32085:
-
Fix Version/s: 3.1.0

> Migrate to NumPy documentation style
> 
>
> Key: SPARK-32085
> URL: https://issues.apache.org/jira/browse/SPARK-32085
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> https://github.com/numpy/numpydoc
> For example,
> Before: 
> https://github.com/apache/spark/blob/f0e6d0ec13d9cdadf341d1b976623345bcdb1028/python/pyspark/sql/dataframe.py#L276-L318
>  After: 
> https://github.com/databricks/koalas/blob/6711e9c0f50c79dd57eeedb530da6c4ea3298de2/databricks/koalas/frame.py#L1122-L1176
> We can incrementally start to switch.
> NOTE that this JIRA targets only to switch the style. It does not target to 
> add additional information or fixes together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32085) Migrate to NumPy documentation style

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32085.
--
Resolution: Done

> Migrate to NumPy documentation style
> 
>
> Key: SPARK-32085
> URL: https://issues.apache.org/jira/browse/SPARK-32085
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> https://github.com/numpy/numpydoc
> For example,
> Before: 
> https://github.com/apache/spark/blob/f0e6d0ec13d9cdadf341d1b976623345bcdb1028/python/pyspark/sql/dataframe.py#L276-L318
>  After: 
> https://github.com/databricks/koalas/blob/6711e9c0f50c79dd57eeedb530da6c4ea3298de2/databricks/koalas/frame.py#L1122-L1176
> We can incrementally start to switch.
> NOTE that this JIRA targets only to switch the style. It does not target to 
> add additional information or fixes together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258692#comment-17258692
 ] 

L. C. Hsieh commented on SPARK-33833:
-

Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258693#comment-17258693
 ] 

L. C. Hsieh commented on SPARK-33833:
-

[~samdvr] Can you help elaborate the question above? Thanks.

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34005:
---
Issue Type: Improvement  (was: Bug)

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-34005:
--

 Summary: Update peak memory metrics for each Executor on task end.
 Key: SPARK-34005
 URL: https://issues.apache.org/jira/browse/SPARK-34005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0, 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Like other peak memory metrics (e.g, stage, executors in a stage), it's better 
to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33919:
---

Assignee: Maxim Gekk

> Unify v1 and v2 SHOW NAMESPACES tests
> -
>
> Key: SPARK-33919
> URL: https://issues.apache.org/jira/browse/SPARK-33919
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run 
> for v1 and v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33919.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30937
[https://github.com/apache/spark/pull/30937]

> Unify v1 and v2 SHOW NAMESPACES tests
> -
>
> Key: SPARK-33919
> URL: https://issues.apache.org/jira/browse/SPARK-33919
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run 
> for v1 and v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258724#comment-17258724
 ] 

Hyukjin Kwon commented on SPARK-33980:
--

I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007. 

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33950:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258724#comment-17258724
 ] 

Hyukjin Kwon edited comment on SPARK-33980 at 1/5/21, 7:43 AM:
---

I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007.  I am correcting the fix version to 3.1.0


was (Author: hyukjin.kwon):
I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007. 

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34000:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258725#comment-17258725
 ] 

Hyukjin Kwon commented on SPARK-33950:
--

I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007.  I am correcting the fix version to 3.1.0

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33894:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Assignee: koert kuipers
>Priority: Major
> Fix For: 3.1.0
>
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
>

[jira] [Created] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread hao (Jira)

hao created SPARK-34006:
---

 Summary: [spark.sql.hive.convertMetastoreOrc]This parameter can 
solve orc format table insert overwrite read table, it should be stated in the 
document
 Key: SPARK-34006
 URL: https://issues.apache.org/jira/browse/SPARK-34006
 Project: Spark
  Issue Type: Bug
  Components: docs
Affects Versions: 3.0.1
Reporter: hao


This parameter can solve orc format table insert overwrite read table, it 
should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-34007:


 Summary: Downgrade scala-maven-plugin to 4.3.0
 Key: SPARK-34007
 URL: https://issues.apache.org/jira/browse/SPARK-34007
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
release script fails as below:

{code}
[INFO] Compiling 21 Scala sources and 3 Java sources to 
/opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
 ...
[ERROR] ## Exception when compiling 24 sources to 
/opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer 
information does not match signer information of other classes in the same 
package
java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
java.lang.ClassLoader.defineClass(ClassLoader.java:754)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
java.lang.Class.getDeclaredMethods0(Native Method)
java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
java.lang.Class.privateGetPublicMethods(Class.java:2902)
java.lang.Class.getMethods(Class.java:1615)
sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34000:


Assignee: (was: Apache Spark)

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258667#comment-17258667
 ] 

Dongjoon Hyun commented on SPARK-25075:
---

[~smarter]. Sorry, but unfortunately, from my assesement, the current status is 
a little different.

1. Apache Spark community is not able to publish Scala 2.13-based Maven 
artifacts yet.

2. Apache Spark community is not able to provide Scala 2.13-based binary 
distribution yet.

3. As you see at this JIRA, the target version of this Jira is 3.2.0, not 3.1.0.

4. For Apache Spark 3.1.0, we already created RC1 without SPARK-33894 and 
SPARK-33894 is marked as Spark 3.1.1. 
 * [https://github.com/apache/spark/releases/tag/v3.1.0-rc1]

Due to (1)~(4), Apache Spark 3.1.0 RC1 will have only Scala 2.12 libraries and 
binaries during vote period.

Of course, I guess we will roll more RCs with more improvements; at least 
SPARK-33894 will be a part of 3.1.0. However, I don't think we can say Scala 
2.13 is supported without the official Scala 2.13 binaries and Scala 2.13 Maven 
artifacts. I guess you also agree that those are mandatory.

 

cc [~hyukjin.kwon] and [~srowen]

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34002) Broken UDF Encoding

2021-01-04 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-34002:
--
Description: 
UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 

Error:
{code:java}
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: 
Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO ResourceUtils: Resources for spark.driver:
21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 
00:52:57 INFO SparkContext: Spark 
configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05
 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 
INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO 
SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
users  with view permissions: Set(marhamil); groups with view permissions: 
Set(); users  with modify permissions: Set(marhamil); groups with modify 
permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 
'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering 
MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology 
information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created 
local directory at 
C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05
 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 
00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 
INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 
00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting 
executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: 
Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 
52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on 
host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO 
BlockManagerMasterEndpoint: Registering block manager 
host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, 
host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: 
Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, 
None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN 
SharedState: Not allowing to set spark.sql.warehouse.dir or 
hive.metastore.warehouse.dir in SparkSession's options, it should be set 
statically for cross-session usagesFailed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct)org.apache.spark.SparkException: Failed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct) at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:156)
 at

[jira] [Created] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-34004:
--

 Summary: Change FrameLessOffsetWindowFunction as sealed abstract 
class
 Key: SPARK-34004
 URL: https://issues.apache.org/jira/browse/SPARK-34004
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: jiaan.geng


Change FrameLessOffsetWindowFunction as sealed abstract class so that simplify 
pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258707#comment-17258707
 ] 

L. C. Hsieh commented on SPARK-33833:
-

Btw, thanks for providing the useful link to previous ticket/PR. 
[~kabhwan]

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258723#comment-17258723
 ] 

Apache Spark commented on SPARK-34003:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/31027

> Rule conflicts between PaddingAndLengthCheckForCharVarchar and 
> ResolveAggregateFunctions
> 
>
> Key: SPARK-34003
> URL: https://issues.apache.org/jira/browse/SPARK-34003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Critical
>
> ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` 
> to generate a `resolved agg` to determine which unresolved sort attribute 
> should be pushed into the agg. However, after we add the 
> PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
> thus, the `resolved agg` cannot match original attributes anymore. 
> It causes some dissociative sort attribute to be pushed in and fails the query
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
> aggregate function. Add to group by or wrap in first() (or first_value) if 
> you don't care which value you get.;
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> [info]
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34003:


Assignee: Apache Spark

> Rule conflicts between PaddingAndLengthCheckForCharVarchar and 
> ResolveAggregateFunctions
> 
>
> Key: SPARK-34003
> URL: https://issues.apache.org/jira/browse/SPARK-34003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Critical
>
> ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` 
> to generate a `resolved agg` to determine which unresolved sort attribute 
> should be pushed into the agg. However, after we add the 
> PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
> thus, the `resolved agg` cannot match original attributes anymore. 
> It causes some dissociative sort attribute to be pushed in and fails the query
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
> aggregate function. Add to group by or wrap in first() (or first_value) if 
> you don't care which value you get.;
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> [info]
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34004:


Assignee: Apache Spark

> Change FrameLessOffsetWindowFunction as sealed abstract class
> -
>
> Key: SPARK-34004
> URL: https://issues.apache.org/jira/browse/SPARK-34004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Change FrameLessOffsetWindowFunction as sealed abstract class so that 
> simplify pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258722#comment-17258722
 ] 

Apache Spark commented on SPARK-34004:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/31026

> Change FrameLessOffsetWindowFunction as sealed abstract class
> -
>
> Key: SPARK-34004
> URL: https://issues.apache.org/jira/browse/SPARK-34004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Change FrameLessOffsetWindowFunction as sealed abstract class so that 
> simplify pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34004:


Assignee: (was: Apache Spark)

> Change FrameLessOffsetWindowFunction as sealed abstract class
> -
>
> Key: SPARK-34004
> URL: https://issues.apache.org/jira/browse/SPARK-34004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Change FrameLessOffsetWindowFunction as sealed abstract class so that 
> simplify pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34003:


Assignee: (was: Apache Spark)

> Rule conflicts between PaddingAndLengthCheckForCharVarchar and 
> ResolveAggregateFunctions
> 
>
> Key: SPARK-34003
> URL: https://issues.apache.org/jira/browse/SPARK-34003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Critical
>
> ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` 
> to generate a `resolved agg` to determine which unresolved sort attribute 
> should be pushed into the agg. However, after we add the 
> PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
> thus, the `resolved agg` cannot match original attributes anymore. 
> It causes some dissociative sort attribute to be pushed in and fails the query
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
> aggregate function. Add to group by or wrap in first() (or first_value) if 
> you don't care which value you get.;
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> [info]
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32017) Make Pyspark Hadoop 3.2+ Variant available in PyPI

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258721#comment-17258721
 ] 

Apache Spark commented on SPARK-32017:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31028

> Make Pyspark Hadoop 3.2+ Variant available in PyPI
> --
>
> Key: SPARK-32017
> URL: https://issues.apache.org/jira/browse/SPARK-32017
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: George Pongracz
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.0
>
>
> The version of Pyspark 3.0.0 currently available in PyPI currently uses 
> hadoop 2.7.4.
> Could a variant (or the default) have its version of Hadoop aligned to 3.2.0 
> as per the downloadable spark binaries.
> This would enable the PyPI version to be compatible with session token 
> authorisations and assist in accessing data residing in object stores with 
> stronger encryption methods.
> If not PyPI then as a tar file in the apache download archives at the least 
> please.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33992:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.0
>
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34006:


Assignee: (was: Apache Spark)

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258733#comment-17258733
 ] 

Apache Spark commented on SPARK-34006:
--

User 'dh20' has created a pull request for this issue:
https://github.com/apache/spark/pull/31030

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34006:


Assignee: Apache Spark

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Assignee: Apache Spark
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258732#comment-17258732
 ] 

Apache Spark commented on SPARK-34006:
--

User 'dh20' has created a pull request for this issue:
https://github.com/apache/spark/pull/31030

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34002) Broken UDF behavior

2021-01-04 Thread Mark Hamilton (Jira)

Mark Hamilton created SPARK-34002:
-

 Summary: Broken UDF behavior
 Key: SPARK-34002
 URL: https://issues.apache.org/jira/browse/SPARK-34002
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Mark Hamilton


UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33242) Install numpydoc in Jenkins machines

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33242:
-
Parent: (was: SPARK-32085)
Issue Type: Test  (was: Sub-task)

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258717#comment-17258717
 ] 

Jungtaek Lim commented on SPARK-33833:
--

That’s available with serious caution. Spark has to have full control of offset 
management and it shouldn’t be touched from outside in any way. Creating unique 
group ID is a defensive approach on this, preventing end users to mess up by 
accident. Once end users set the static group ID, the guard is no longer valid.

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33989) Strip auto-generated cast when using Cast.sql

2021-01-04 Thread ulysses you (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-33989:

Summary: Strip auto-generated cast when using Cast.sql  (was: Strip 
auto-generated cast when resolving UnresolvedAlias)

> Strip auto-generated cast when using Cast.sql
> -
>
> Key: SPARK-33989
> URL: https://issues.apache.org/jira/browse/SPARK-33989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> During analysis we may introduce the Cast if exists type cast implicitly. 
> That makes assgined name unclear.
> Let's say we have a sql `select id == null` which id is int type, then the 
> output field name will be `(id = CAST(null as int))`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2021-01-04 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258734#comment-17258734
 ] 

Wenchen Fan commented on SPARK-33948:
-

SPARK-33619 improved the codegen test coverage of Spark expression tests, this 
might be the reason for these test failures.

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]
>  
>

[jira] [Resolved] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33992.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 31013
[https://github.com/apache/spark/pull/31013]

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.0
>
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258675#comment-17258675
 ] 

Apache Spark commented on SPARK-34001:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31022

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33992:
---

Assignee: Kent Yao

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258676#comment-17258676
 ] 

Apache Spark commented on SPARK-34001:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31022

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33935) Fix CBOs cost function

2021-01-04 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33935.
--
Fix Version/s: 3.2.0
   3.1.0
 Assignee: Tanel Kiis
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/30965

> Fix CBOs cost function 
> ---
>
> Key: SPARK-33935
> URL: https://issues.apache.org/jira/browse/SPARK-33935
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
>
> The parameter spark.sql.cbo.joinReorder.card.weight is decumented as:
> {code:title=spark.sql.cbo.joinReorder.card.weight}
> The weight of cardinality (number of rows) for plan cost comparison in join 
> reorder: rows * weight + size * (1 - weight).
> {code}
> But in the implementation the formula is a bit different:
> {code:title=Current implementation}
> def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
>   if (other.planCost.card == 0 || other.planCost.size == 0) {
> false
>   } else {
> val relativeRows = BigDecimal(this.planCost.card) / 
> BigDecimal(other.planCost.card)
> val relativeSize = BigDecimal(this.planCost.size) / 
> BigDecimal(other.planCost.size)
> relativeRows * conf.joinReorderCardWeight +
>   relativeSize * (1 - conf.joinReorderCardWeight) < 1
>   }
> }
> {code}
> This change has an unfortunate consequence: 
> given two plans A and B, both A betterThan B and B betterThan A might give 
> the same results. This happes when one has many rows with small sizes and 
> other has few rows with large sizes.
> A example values, that have this fenomen with the default weight value (0.7):
> A.card = 500, B.card = 300
> A.size = 30, B.size = 80
> Both A betterThan B and B betterThan A would have score above 1 and would 
> return false.
> A new implementation is proposed, that matches the documentation:
> {code:title=Proposed implementation}
> def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
>   val oldCost = BigDecimal(this.planCost.card) * 
> conf.joinReorderCardWeight +
> BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight)
>   val newCost = BigDecimal(other.planCost.card) * 
> conf.joinReorderCardWeight +
> BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight)
>   newCost < oldCost
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258736#comment-17258736
 ] 

Apache Spark commented on SPARK-34007:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31031

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34007:


Assignee: Apache Spark

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33992:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.1
>
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34002) Broken UDF Encoding

2021-01-04 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-34002:
--
Description: 
UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
import org.apache.spark.sql.expressions.UserDefinedFunction 
import org.apache.spark.sql.functions.{col, udf}

case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 

Error:

Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program 
Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program 
Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program 
Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath 
"C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program
 Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program

[jira] [Updated] (SPARK-34002) Broken UDF Encoding

2021-01-04 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-34002:
--
Description: 
UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
import org.apache.spark.sql.expressions.UserDefinedFunction 
import org.apache.spark.sql.functions.{col, udf}

case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 

Error:
{code:java}
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: 
Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO ResourceUtils: Resources for spark.driver:
21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 
00:52:57 INFO SparkContext: Spark 
configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05
 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 
INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO 
SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
users  with view permissions: Set(marhamil); groups with view permissions: 
Set(); users  with modify permissions: Set(marhamil); groups with modify 
permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 
'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering 
MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology 
information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created 
local directory at 
C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05
 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 
00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 
INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 
00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting 
executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: 
Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 
52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on 
host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO 
BlockManagerMasterEndpoint: Registering block manager 
host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, 
host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: 
Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, 
None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN 
SharedState: Not allowing to set spark.sql.warehouse.dir or 
hive.metastore.warehouse.dir in SparkSession's options, it should be set 
statically for cross-session usagesFailed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct)org.apache.spark.SparkException: Failed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct) at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at

[jira] [Assigned] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34007:


Assignee: (was: Apache Spark)

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33979) Filter predicate reorder

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258639#comment-17258639
 ] 

Apache Spark commented on SPARK-33979:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31024

> Filter predicate reorder
> 
>
> Key: SPARK-33979
> URL: https://issues.apache.org/jira/browse/SPARK-33979
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Reorder filter predicate to improve query performance:
> {noformat}
> others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258670#comment-17258670
 ] 

Dongjoon Hyun commented on SPARK-31786:
---

Yes, you are correct.
 # `export` is only required for your machine.
 # `–conf` should be used for `driverEnv`.

Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better 
because of SPARK-33005 . FYI, Apache Spark 3.1.0 RC1 is already created.

- https://github.com/apache/spark/tree/v3.1.0-rc1

Apache Spark 3.1.0 will arrive this month.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at 
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> at okio.Okio$1.write(Okio.java:79)
> at

[jira] [Created] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Terry Kim (Jira)

Terry Kim created SPARK-34001:
-

 Summary: Remove unused runShowTablesSql() in 
DataSourceV2SQLSuite.scala
 Key: SPARK-34001
 URL: https://issues.apache.org/jira/browse/SPARK-34001
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Terry Kim


runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34001.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31022
[https://github.com/apache/spark/pull/31022]

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34001:
-

Assignee: Terry Kim

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33998.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31020
[https://github.com/apache/spark/pull/31020]

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33998:
---

Assignee: Terry Kim

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258692#comment-17258692
 ] 

L. C. Hsieh edited comment on SPARK-33833 at 1/5/21, 6:27 AM:
--

Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?

If so, then Spark doesn't need any change for this ticket.




was (Author: viirya):
Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33794) next_day function should throw runtime exception when receiving invalid input under ANSI mode

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33794:
---

Assignee: Chongguang LIU

> next_day function should throw runtime exception when receiving invalid input 
> under ANSI mode
> -
>
> Key: SPARK-33794
> URL: https://issues.apache.org/jira/browse/SPARK-33794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Chongguang LIU
>Assignee: Chongguang LIU
>Priority: Major
>
> Hello all,
> According to [ANSI 
> compliance|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html#ansi-compliance],
>  the [next_day 
> function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3095]
>  should throw an runtime exception when receiving invalid value for 
> dayOfWeek, exemple receiving "xx" instead of "SUNDAY".
>  
> A similar improvement has been done on the element_at function: 
> https://issues.apache.org/jira/browse/SPARK-33386
>  
> If you agree with this proposition, i can submit a pull request with 
> necessary change.
>  
> Kind regardes,
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33794) next_day function should throw runtime exception when receiving invalid input under ANSI mode

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33794.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30807
[https://github.com/apache/spark/pull/30807]

> next_day function should throw runtime exception when receiving invalid input 
> under ANSI mode
> -
>
> Key: SPARK-33794
> URL: https://issues.apache.org/jira/browse/SPARK-33794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Chongguang LIU
>Assignee: Chongguang LIU
>Priority: Major
> Fix For: 3.2.0
>
>
> Hello all,
> According to [ANSI 
> compliance|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html#ansi-compliance],
>  the [next_day 
> function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3095]
>  should throw an runtime exception when receiving invalid value for 
> dayOfWeek, exemple receiving "xx" instead of "SUNDAY".
>  
> A similar improvement has been done on the element_at function: 
> https://issues.apache.org/jira/browse/SPARK-33386
>  
> If you agree with this proposition, i can submit a pull request with 
> necessary change.
>  
> Kind regardes,
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-34000:
---
Affects Version/s: 3.0.1

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34000:
-

Assignee: Lantao Jin

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34000.
---
Fix Version/s: 3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 31025
[https://github.com/apache/spark/pull/31025]

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33995) Make datetime addition easier for years, weeks, hours, minutes, and seconds

2021-01-04 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258718#comment-17258718
 ] 

Maxim Gekk commented on SPARK-33995:


> Option 1: Single make_interval function that takes 7 arguments

Small clarification. make_interval could have default values for all 7 
arguments like Postgress has, see 
[https://www.postgresql.org/docs/9.4/functions-datetime.html]

> As a user, Option 3 would be my preference.  
>col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember 
>and type.

I like this approach too

> Make datetime addition easier for years, weeks, hours, minutes, and seconds
> ---
>
> Key: SPARK-33995
> URL: https://issues.apache.org/jira/browse/SPARK-33995
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Matthew Powers
>Priority: Minor
>
> There are add_months and date_add functions that make it easy to perform 
> datetime addition with months and days, but there isn't an easy way to 
> perform datetime addition with years, weeks, hours, minutes, or seconds with 
> the Scala/Python/R APIs.
> Users need to write code like expr("first_datetime + INTERVAL 2 hours") to 
> add two hours to a timestamp with the Scala API, which isn't desirable.  We 
> don't want to make Scala users manipulate SQL strings.
> We can expose the [make_interval SQL 
> function|https://github.com/apache/spark/pull/26446/files] to make any 
> combination of datetime addition possible.  That'll make tons of different 
> datetime addition operations possible and will be valuable for a wide array 
> of users.
> make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and 
> secs.
> There are different ways to expose the make_interval functionality to 
> Scala/Python/R users:
>  * Option 1: Single make_interval function that takes 7 arguments
>  * Option 2: expose a few interval functions
>  ** make_date_interval function that takes years, months, days
>  ** make_time_interval function that takes hours, minutes, seconds
>  ** make_datetime_interval function that takes years, months, days, hours, 
> minutes, seconds
>  * Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, 
> add_minutes, and add_seconds as Column methods.  
>  * Option 4: Expose the add_years, add_hours, etc. as column functions.  
> add_weeks and date_add have already been exposed in this manner.  
> Option 1 is nice from a maintenance perspective cause it's a single function, 
> but it's not standard from a user perspective.  Most languages support 
> datetime instantiation with these arguments: years, months, days, hours, 
> minutes, seconds.  Mixing weeks into the equation is not standard.
> As a user, Option 3 would be my preference.  
> col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember 
> and type.  col("first_datetime") + make_time_interval(lit(2), lit(0), 
> lit(30)) isn't as nice.  col("first_datetime") + make_interval(lit(0), 
> lit(0), lit(0), lit(0), lit(2), lit(0), lit(30)) is harder still.
> Any of these options is an improvement to the status quo.  Let me know what 
> option you think is best and then I'll make a PR to implement it, building 
> off of Max's foundational work of course ;)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258726#comment-17258726
 ] 

L. C. Hsieh commented on SPARK-33833:
-

Yea, but this can be easily overcome here. We just need to have a user-provided 
group id for committing offset purpose. As users need to specify it when they 
want to commit offset and track the progress, this is used by users with 
caution. Even for committing with currently static group ID given by users, I 
do not think it is really a reason to reject the committing offset idea. Once 
users decide to commit offset and track the progress, they should be cautious 
with the risk.

Anyway, this seems not the reason causing the previous PR to be closed.

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34005:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258727#comment-17258727
 ] 

Apache Spark commented on SPARK-34005:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31029

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33980:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34005:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33950:
--
Fix Version/s: 3.1.0
   3.0.2

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33950:
--
Labels: correctness  (was: )

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread William Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Hyun updated SPARK-33996:
-
Parent: SPARK-33772
Issue Type: Sub-task  (was: Improvement)

> Upgrade checkstyle plugins
> --
>
> Key: SPARK-33996
> URL: https://issues.apache.org/jira/browse/SPARK-33996
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33996:
-

Assignee: William Hyun

> Upgrade checkstyle plugins
> --
>
> Key: SPARK-33996
> URL: https://issues.apache.org/jira/browse/SPARK-33996
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33996.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31019
[https://github.com/apache/spark/pull/31019]

> Upgrade checkstyle plugins
> --
>
> Key: SPARK-33996
> URL: https://issues.apache.org/jira/browse/SPARK-33996
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33987.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31017
[https://github.com/apache/spark/pull/31017]

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33987:
-

Assignee: Maxim Gekk

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33996:


Assignee: Apache Spark

> Upgrade checkstyle plugins
> --
>
> Key: SPARK-33996
> URL: https://issues.apache.org/jira/browse/SPARK-33996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33996:


Assignee: (was: Apache Spark)

> Upgrade checkstyle plugins
> --
>
> Key: SPARK-33996
> URL: https://issues.apache.org/jira/browse/SPARK-33996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258531#comment-17258531
 ] 

Apache Spark commented on SPARK-33996:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31019

> Upgrade checkstyle plugins
> --
>
> Key: SPARK-33996
> URL: https://issues.apache.org/jira/browse/SPARK-33996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33031) scheduler with blacklisting doesn't appear to pick up new executor added

2021-01-04 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258498#comment-17258498
 ] 

Thomas Graves commented on SPARK-33031:
---

ah that could be the case, but if that is true we probably need to fix 
something in the UI to indicate that

> scheduler with blacklisting doesn't appear to pick up new executor added
> 
>
> Key: SPARK-33031
> URL: https://issues.apache.org/jira/browse/SPARK-33031
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Thomas Graves
>Priority: Critical
>
> I was running a test with blacklisting  standalone mode and all the executors 
> were initially blacklisted.  Then one of the executors died and we got 
> allocated another one. The scheduler did not appear to pick up the new one 
> and try to schedule on it though.
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }
> rdd.collect(){code}
>  
> Note that I tried both with and without dynamic allocation enabled.
>  
> You can see screen shot related on 
> https://issues.apache.org/jira/browse/SPARK-33029



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258516#comment-17258516
 ] 

Dongjoon Hyun commented on SPARK-33950:
---

This landed at branch-3.1 and branch-3.0 via 
[https://github.com/apache/spark/pull/31006] .

I also added `correctness` label to this issue.

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33950:
--
Affects Version/s: 3.1.0
   3.0.1

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33996) Upgrade checkstyle plugins

2021-01-04 Thread William Hyun (Jira)

William Hyun created SPARK-33996:


 Summary: Upgrade checkstyle plugins
 Key: SPARK-33996
 URL: https://issues.apache.org/jira/browse/SPARK-33996
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33844) InsertIntoDir failed since query column name contains ',' cause column type and column names size not equal

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33844.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30850
[https://github.com/apache/spark/pull/30850]

> InsertIntoDir failed since query column name contains ',' cause column type 
> and column names size not equal
> ---
>
> Key: SPARK-33844
> URL: https://issues.apache.org/jira/browse/SPARK-33844
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
>  
> After hive-2.3 we will set COLUMN_NAME_DELIMITER to special char when col 
> name cntains ','since column list and column types in serde is splited by 
> COLUMN_NAME_DELIMITER.
>  In spark-2.4.0 + hive-1.2.1 we will failed when INSERT OVERWRITE DIR when 
> query result schema columan name contains ','as
> {code:java}
>  org.apache.hadoop.hive.serde2.SerDeException: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 14 elements 
> while columns.types has 11 elements! at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.extractColumnInfo(LazySerDeParameters.java:146)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:85)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
>  at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
>  at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:287)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:219)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:218)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:121) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:461)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  Since this problem has been solved by 
> [https://github.com/apache/hive/blob/6f4c35c9e904d226451c465effdc5bfd31d395a0/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L1044-L1075]
>  But I think we can do this in Spark side to make all version work well.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33844) InsertIntoDir failed since query column name contains ',' cause column type and column names size not equal

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33844:
---

Assignee: angerszhu

> InsertIntoDir failed since query column name contains ',' cause column type 
> and column names size not equal
> ---
>
> Key: SPARK-33844
> URL: https://issues.apache.org/jira/browse/SPARK-33844
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
>  
> After hive-2.3 we will set COLUMN_NAME_DELIMITER to special char when col 
> name cntains ','since column list and column types in serde is splited by 
> COLUMN_NAME_DELIMITER.
>  In spark-2.4.0 + hive-1.2.1 we will failed when INSERT OVERWRITE DIR when 
> query result schema columan name contains ','as
> {code:java}
>  org.apache.hadoop.hive.serde2.SerDeException: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 14 elements 
> while columns.types has 11 elements! at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.extractColumnInfo(LazySerDeParameters.java:146)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:85)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
>  at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
>  at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:287)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:219)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:218)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:121) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:461)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  Since this problem has been solved by 
> [https://github.com/apache/hive/blob/6f4c35c9e904d226451c465effdc5bfd31d395a0/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L1044-L1075]
>  But I think we can do this in Spark side to make all version work well.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 224 matches

Mail list logo