from:"sarutak"

(spark) branch master updated (ee2a87b4642c -> 8fa794b13195)

2024-01-09 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ee2a87b4642c [SPARK-40876][SQL][TESTS][FOLLOW-UP] Remove invalid 
decimal test case when ANSI mode is on
 add 8fa794b13195 [SPARK-46627][SS][UI] Fix timeline tooltip content on 
streaming ui

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/org/apache/spark/ui/static/streaming-page.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44490][WEBUI] Remove unused `TaskPagedTable` in StagePage

2023-08-01 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 546e39c5dab [SPARK-44490][WEBUI] Remove unused `TaskPagedTable` in 
StagePage
546e39c5dab is described below

commit 546e39c5dabc243ab81b6238dc893d9993e0
Author: sychen 
AuthorDate: Tue Aug 1 15:37:27 2023 +0900

[SPARK-44490][WEBUI] Remove unused `TaskPagedTable` in StagePage

### What changes were proposed in this pull request?
 Remove `TaskPagedTable`

### Why are the changes needed?
In [SPARK-21809](https://issues.apache.org/jira/browse/SPARK-21809), we 
introduced `stagespage-template.html` to show the running status of Stage.
`TaskPagedTable` is no longer effective, but there are still many PRs 
updating related codes.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

Closes #42085 from cxzl25/SPARK-44490.

Authored-by: sychen 
Signed-off-by: Kousuke Saruta 
---
 .../scala/org/apache/spark/ui/jobs/StagePage.scala | 301 +
 .../scala/org/apache/spark/ui/StagePageSuite.scala |  12 +-
 2 files changed, 13 insertions(+), 300 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 02aece6e50a..d50ccdadff5 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -17,17 +17,12 @@
 
 package org.apache.spark.ui.jobs
 
-import java.net.URLEncoder
-import java.nio.charset.StandardCharsets.UTF_8
 import java.util.Date
-import java.util.concurrent.TimeUnit
 import javax.servlet.http.HttpServletRequest
 
-import scala.collection.mutable.{HashMap, HashSet}
+import scala.collection.mutable.HashSet
 import scala.xml.{Node, Unparsed}
 
-import org.apache.commons.text.StringEscapeUtils
-
 import org.apache.spark.internal.config.UI._
 import org.apache.spark.scheduler.TaskLocality
 import org.apache.spark.status._
@@ -209,32 +204,20 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 val dagViz = UIUtils.showDagVizForStage(stageId, stageGraph)
 
 val currentTime = System.currentTimeMillis()
-val taskTable = try {
-  val _taskTable = new TaskPagedTable(
-stageData,
-UIUtils.prependBaseUri(request, parent.basePath) +
-  s"/stages/stage/?id=${stageId}=${stageAttemptId}",
-pageSize = taskPageSize,
-sortColumn = taskSortColumn,
-desc = taskSortDesc,
-store = parent.store
-  )
-  _taskTable
-} catch {
-  case e @ (_ : IllegalArgumentException | _ : IndexOutOfBoundsException) 
=>
-null
-}
 
 val content =
   summary ++
   dagViz ++  ++
   makeTimeline(
 // Only show the tasks in the table
-Option(taskTable).map({ taskPagedTable =>
+() => {
   val from = (eventTimelineTaskPage - 1) * eventTimelineTaskPageSize
-  val to = taskPagedTable.dataSource.dataSize.min(
-eventTimelineTaskPage * eventTimelineTaskPageSize)
-  taskPagedTable.dataSource.sliceData(from, to)}).getOrElse(Nil), 
currentTime,
+  val dataSize = store.taskCount(stageData.stageId, 
stageData.attemptId).toInt
+  val to = dataSize.min(eventTimelineTaskPage * 
eventTimelineTaskPageSize)
+  val sliceData = store.taskList(stageData.stageId, 
stageData.attemptId, from, to - from,
+indexName(taskSortColumn), !taskSortDesc)
+  sliceData
+}, currentTime,
 eventTimelineTaskPage, eventTimelineTaskPageSize, 
eventTimelineTotalPages, stageId,
 stageAttemptId, totalTasks) ++
 
@@ -246,8 +229,8 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
   }
 
-  def makeTimeline(
-  tasks: Seq[TaskData],
+  private def makeTimeline(
+  tasksFunc: () => Seq[TaskData],
   currentTime: Long,
   page: Int,
   pageSize: Int,
@@ -258,6 +241,8 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
 if (!TIMELINE_ENABLED) return Seq.empty[Node]
 
+val tasks = tasksFunc()
+
 val executorsSet = new HashSet[(String, String)]
 var minLaunchTime = Long.MaxValue
 var maxFinishTime = Long.MinValue
@@ -453,268 +438,6 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
 }
 
-private[ui] class TaskDataSource(
-stage: StageData,
-pageSize: Int,
-sortColumn: String,
-desc: Boolean,
-store: AppStatusStore) extends PagedDataSource[TaskData](pageSize) {
-  import ApiHelper._
-
-  // Keep an internal cache of executor log maps so that long task lists 
render

[spark] branch master updated: [MINOR][UI] Simplify columnDefs in stagepage.js

2023-07-22 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 46440a4a542 [MINOR][UI] Simplify columnDefs in stagepage.js
46440a4a542 is described below

commit 46440a4a542148bc05b8c0f80d1860e6380efdb6
Author: Kent Yao 
AuthorDate: Sat Jul 22 17:12:07 2023 +0900

[MINOR][UI] Simplify columnDefs in stagepage.js

### What changes were proposed in this pull request?

Simplify `columnDefs` in stagepage.js

### Why are the changes needed?

Reduce hardcode in stagepage.js and potential inconsistency for hidden/show 
in future changes.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Locally verified.

https://github.com/apache/spark/assets/8326978/3b3595a4-7825-47d5-8c28-30ec916321e6;>

Closes #42101 from yaooqinn/m.

Authored-by: Kent Yao 
Signed-off-by: Kousuke Saruta 
---
 .../org/apache/spark/ui/static/stagepage.js| 35 ++
 1 file changed, 9 insertions(+), 26 deletions(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
index 50bf959d3aa..a8792593bf2 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
@@ -235,11 +235,7 @@ function 
createDataTableForTaskSummaryMetricsTable(taskSummaryMetricsTable) {
 }
   ],
   "columnDefs": [
-{ "type": "duration", "targets": 1 },
-{ "type": "duration", "targets": 2 },
-{ "type": "duration", "targets": 3 },
-{ "type": "duration", "targets": 4 },
-{ "type": "duration", "targets": 5 }
+{ "type": "duration", "targets": [1, 2, 3, 4, 5] }
   ],
   "paging": false,
   "info": false,
@@ -592,22 +588,16 @@ $(document).ready(function () {
 // The targets: $id represents column id which comes from 
stagespage-template.html
 // #summary-executor-table.If the relative position of the 
columns in the table
 // #summary-executor-table has changed,please be careful to 
adjust the column index here
-// Input Size / Records
-{"type": "size", "targets": 9},
-// Output Size / Records
-{"type": "size", "targets": 10},
-// Shuffle Read Size / Records
-{"type": "size", "targets": 11},
-// Shuffle Write Size / Records
-{"type": "size", "targets": 12},
+// Input Size / Records - 9
+// Output Size / Records - 10
+// Shuffle Read Size / Records - 11
+// Shuffle Write Size / Records - 12
+{"type": "size", "targets": [9, 10, 11, 12]},
 // Peak JVM Memory OnHeap / OffHeap
-{"visible": false, "targets": 15},
 // Peak Execution Memory OnHeap / OffHeap
-{"visible": false, "targets": 16},
 // Peak Storage Memory OnHeap / OffHeap
-{"visible": false, "targets": 17},
 // Peak Pool Memory Direct / Mapped
-{"visible": false, "targets": 18}
+{"visible": false, "targets": executorOptionalColumns},
   ],
   "deferRender": true,
   "order": [[0, "asc"]],
@@ -1079,15 +1069,8 @@ $(document).ready(function () {
 }
   ],
   "columnDefs": [
-{ "visible": false, "targets": 11 },
-{ "visible": false, "targets": 12 },
-{ "visible": false, "targets": 13 },
-{ "visible": false, "targets": 14 },
-{ "visible": false, "targets": 15 },
-{ "visible": false, "targets": 16 },
-{ "visible": false, "targets": 17 },
-{ "visible": false, "targets": 18 },
-{ "visible": false, "targets": 21 }
+{ "visible": false, "targets": optionalColumns },
+{ "visible": false, "targets": 18 }, // accumulators
   ],
   "deferRender": true
 };


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44279][BUILD] Upgrade `optionator` to ^0.9.3

2023-07-13 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d35fda69e49 [SPARK-44279][BUILD] Upgrade `optionator`  to ^0.9.3
d35fda69e49 is described below

commit d35fda69e49b06cda316ecd664acb22cb8c12266
Author: Bjørn Jørgensen 
AuthorDate: Fri Jul 14 03:26:56 2023 +0900

[SPARK-44279][BUILD] Upgrade `optionator`  to ^0.9.3

### What changes were proposed in this pull request?
This PR proposes a change in the package.json file to update the resolution 
for the `optionator` package to ^0.9.3.

I've added a resolutions field to package.json and specified the 
`optionator` package version as ^0.9.3.
This will ensure that our project uses `optionator` version 0.9.3 or the 
latest minor or patch version (due to the caret ^), regardless of any other 
version that may be specified in the dependencies or nested dependencies of our 
project.

### Why are the changes needed?
[CVE-2023-26115](https://nvd.nist.gov/vuln/detail/CVE-2023-26115)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

Closes #41955 from bjornjorgensen/word-wrap.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 774 ++
 dev/package.json  |   3 +
 2 files changed, 350 insertions(+), 427 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index 104a3fb7854..f676b9cec07 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -10,6 +10,15 @@
 "minimatch": "^3.1.2"
   }
 },
+"node_modules/@aashutoshrathi/word-wrap": {
+  "version": "1.2.6",
+  "resolved": 
"https://registry.npmjs.org/@aashutoshrathi/word-wrap/-/word-wrap-1.2.6.tgz;,
+  "integrity": 
"sha512-1Yjs2SvM8TflER/OD3cOjhWWOZb58A2t7wpE2S9XfBYTiIl+XFhQG2bjy4Pu1I+EAlCNUzRDYDdFwFYUKvXcIA==",
+  "dev": true,
+  "engines": {
+"node": ">=0.10.0"
+  }
+},
 "node_modules/@babel/code-frame": {
   "version": "7.12.11",
   "resolved": 
"https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz;,
@@ -20,20 +29,38 @@
   }
 },
 "node_modules/@babel/helper-validator-identifier": {
-  "version": "7.14.0",
-  "resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz;,
-  "integrity": 
"sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==",
-  "dev": true
+  "version": "7.22.5",
+  "resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.22.5.tgz;,
+  "integrity": 
"sha512-aJXu+6lErq8ltp+JhkJUfk1MTGyuA4v7f3pA+BJ5HLfNC6nAQ0Cpi9uOquUj8Hehg0aUiHzWQbOVJGao6ztBAQ==",
+  "dev": true,
+  "engines": {
+"node": ">=6.9.0"
+  }
 },
 "node_modules/@babel/highlight": {
-  "version": "7.14.0",
-  "resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz;,
-  "integrity": 
"sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==",
+  "version": "7.22.5",
+  "resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.22.5.tgz;,
+  "integrity": 
"sha512-BSKlD1hgnedS5XRnGOljZawtag7H1yPfQp0tdNJCHoH6AZ+Pcm9VvkrK59/Yy593Ypg0zMxH2BxD1VPYUQ7UIw==",
   "dev": true,
   "dependencies": {
-"@babel/helper-validator-identifier": "^7.14.0",
+"@babel/helper-validator-identifier": "^7.22.5",
 "chalk": "^2.0.0",
 "js-tokens": "^4.0.0"
+  },
+  "engines": {
+"node": ">=6.9.0"
+  }
+},
+"node_modules/@babel/highlight/node_modules/ansi-styles": {
+  "version": "3.2.1",
+  "resolved": 
"https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz;,
+  "integrity": 
"sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==",
+  "dev": true,
+  "dependencies": {
+"color-convert": "^1.9.0"

[spark] branch master updated: [SPARK-41634][BUILD] Upgrade `minimatch` to 3.1.2

2022-12-20 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4539260f4ac [SPARK-41634][BUILD] Upgrade `minimatch` to 3.1.2
4539260f4ac is described below

commit 4539260f4ac346f22ce1a47ca9e94e3181803490
Author: Bjørn 
AuthorDate: Wed Dec 21 13:49:45 2022 +0900

[SPARK-41634][BUILD] Upgrade `minimatch` to 3.1.2

### What changes were proposed in this pull request?
Upgrade `minimatch` to 3.1.2

$ npm -v
9.1.2

$ npm install

added 118 packages, and audited 119 packages in 2s

15 packages are looking for funding
 run `npm fund` for details

found 0 vulnerabilities

### Why are the changes needed?
[CVE-2022-3517](https://nvd.nist.gov/vuln/detail/CVE-2022-3517)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

Closes #39143 from bjornjorgensen/upgrade-minimatch.

Authored-by: Bjørn 
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 15 ---
 dev/package.json  |  3 ++-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index c2a61b389ac..104a3fb7854 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -6,7 +6,8 @@
 "": {
   "devDependencies": {
 "ansi-regex": "^5.0.1",
-"eslint": "^7.25.0"
+"eslint": "^7.25.0",
+"minimatch": "^3.1.2"
   }
 },
 "node_modules/@babel/code-frame": {
@@ -853,9 +854,9 @@
   }
 },
 "node_modules/minimatch": {
-  "version": "3.0.4",
-  "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.0.4.tgz;,
-  "integrity": 
"sha512-yJHVQEhyqPLUTgt9B83PXu6W3rx4MvvHvSUvToogpwoGDOUQ+yDrR0HRot+yOCdCO7u4hX3pWft6kWBBcqh0UA==",
+  "version": "3.1.2",
+  "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.2.tgz;,
+  "integrity": 
"sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==",
   "dev": true,
   "dependencies": {
 "brace-expansion": "^1.1.7"
@@ -1931,9 +1932,9 @@
   }
 },
 "minimatch": {
-  "version": "3.0.4",
-  "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.0.4.tgz;,
-  "integrity": 
"sha512-yJHVQEhyqPLUTgt9B83PXu6W3rx4MvvHvSUvToogpwoGDOUQ+yDrR0HRot+yOCdCO7u4hX3pWft6kWBBcqh0UA==",
+  "version": "3.1.2",
+  "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.2.tgz;,
+  "integrity": 
"sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==",
   "dev": true,
   "requires": {
 "brace-expansion": "^1.1.7"
diff --git a/dev/package.json b/dev/package.json
index f975bdde831..4e4a4bf1bca 100644
--- a/dev/package.json
+++ b/dev/package.json
@@ -1,6 +1,7 @@
 {
   "devDependencies": {
 "eslint": "^7.25.0",
-"ansi-regex": "^5.0.1"
+"ansi-regex": "^5.0.1",
+"minimatch": "^3.1.2"
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41587][BUILD] Upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`

2022-12-20 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ee2e582ff19 [SPARK-41587][BUILD] Upgrade 
`org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`
ee2e582ff19 is described below

commit ee2e582ff195fa11047545f43d1cb0ebd20a7091
Author: yangjie01 
AuthorDate: Wed Dec 21 13:40:40 2022 +0900

[SPARK-41587][BUILD] Upgrade `org.scalatestplus:selenium-4-4` to 
`org.scalatestplus:selenium-4-7`

### What changes were proposed in this pull request?
This pr aims upgrade `org.scalatestplus:selenium-4-4` to 
`org.scalatestplus:selenium-4-7`:

- `org.scalatestplus:selenium-4-4` -> `org.scalatestplus:selenium-4-7`
- `selenium-java`: 4.4.0 -> 4.7.1
- `htmlunit-driver`: 3.64.0 -> 4.7.0
- `htmlunit` -> 2.64.0 -> 2.67.0

And all upgraded dependencies versions are matched.

### Why are the changes needed?
The release notes as follows:

- 
https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?

- Pass Github Actions
- Manual test:
   - ChromeUISeleniumSuite

```
build/sbt -Dguava.version=31.1-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly 
org.apache.spark.ui.ChromeUISeleniumSuite"
```

```
[info] ChromeUISeleniumSuite:
Starting ChromeDriver 108.0.5359.71 
(1e0e3868ee06e91ad636a874420e3ca3ae3756ac-refs/branch-heads/5359{#1016}) on 
port 13600
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for 
suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
[info] - SPARK-31534: text for tooltip should be escaped (2 seconds, 702 
milliseconds)
[info] - SPARK-31882: Link URL for Stage DAGs should not depend on paged 
table. (824 milliseconds)
[info] - SPARK-31886: Color barrier execution mode RDD correctly (313 
milliseconds)
[info] - Search text for paged tables should not be saved (1 second, 745 
milliseconds)
[info] Run completed in 10 seconds, 266 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 23 s, completed 2022-12-19 19:41:26
```

   - RocksDBBackendChromeUIHistoryServerSuite

```
build/sbt -Dguava.version=31.1-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly 
org.apache.spark.deploy.history.RocksDBBackendChromeUIHistoryServerSuite"
```

```
[info] RocksDBBackendChromeUIHistoryServerSuite:
Starting ChromeDriver 108.0.5359.71 
(1e0e3868ee06e91ad636a874420e3ca3ae3756ac-refs/branch-heads/5359{#1016}) on 
port 2201
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for 
suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
[info] - ajax rendered relative links are prefixed with uiRoot 
(spark.ui.proxyBase) (2 seconds, 362 milliseconds)
[info] Run completed in 10 seconds, 254 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 24 s, completed 2022-12-19 19:40:42
```

Closes #39129 from LuciferYang/selenium-47.

Authored-by: yangjie01 
Signed-off-by: Kousuke Saruta 
---
 pom.xml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pom.xml b/pom.xml
index 5ae26570e2d..f09207c660f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -205,9 +205,9 @@
 
 4.9.3
 1.1
-4.4.0
-3.64.0
-2.64.0
+4.7.1
+4.7.0
+2.67.0
 1.8
 1.1.0
 1.5.0
@@ -416,7 +416,7 @@
 
 
   org.scalatestplus
-  selenium-4-4_${scala.binary.version}
+  selenium-4-7_${scala.binary.version}
   test
 
 
@@ -1144,7 +1144,7 @@
   
   
 org.scalatestplus
-selenium-4-4_${scala.binary.version}
+selenium-4-7_${scala.binary.version}
 3.2.14.0
 test
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (12e48527846 -> 40590e6d911)

2022-09-14 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 12e48527846 [SPARK-40423][K8S][TESTS] Add explicit YuniKorn queue 
submission test coverage
 add 40590e6d911 [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` 
to 3.12.13

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2-hive-2.3 |  2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 |  2 +-
 pom.xml   | 18 +++---
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][INFRA] Add ANTLR generated files to .gitignore

2022-03-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 46ccc22  [MINOR][INFRA] Add ANTLR generated files to .gitignore
46ccc22 is described below

commit 46ccc22ee40c780f6ae4a9af4562fb1ad10ccd9f
Author: Yuto Akutsu 
AuthorDate: Thu Mar 17 18:12:13 2022 +0900

[MINOR][INFRA] Add ANTLR generated files to .gitignore

### What changes were proposed in this pull request?

Add git ignore entries for files created by ANTLR.

### Why are the changes needed?

To avoid developers from accidentally adding those files when working on 
parser/lexer.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

To make sure those files are ignored by `git status` when they exist.

Closes #35838 from yutoacts/minor_gitignore.

Authored-by: Yuto Akutsu 
Signed-off-by: Kousuke Saruta 
---
 .gitignore | 5 +
 1 file changed, 5 insertions(+)

diff --git a/.gitignore b/.gitignore
index b758781..0e2f59f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -117,3 +117,8 @@ spark-warehouse/
 
 # For Node.js
 node_modules
+
+# For Antlr
+sql/catalyst/gen/
+sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.tokens
+sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/gen/

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

2022-02-24 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 637a69f  [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 
in /dev
637a69f is described below

commit 637a69f349d01199db8af7331a22d2b9154cb50e
Author: bjornjorgensen 
AuthorDate: Fri Feb 25 11:43:36 2022 +0900

[SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

### What changes were proposed in this pull request?
Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev

### Why are the changes needed?

[CVE-2021-3807](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3807)

[Releases notes at github](https://github.com/chalk/ansi-regex/releases)

By upgrading ansi-regex from 5.0.0 to 5.0.1 we will resolve this issue.

### Does this PR introduce _any_ user-facing change?
Some users use remote security scanners and this is one of the issues that 
comes up. How this can do some damage with spark is highly uncertain. but let's 
remove the uncertainty that any user may have.

### How was this patch tested?
All test must pass.

Closes #35628 from bjornjorgensen/ansi-regex-from-5.0.0-to-5.0.1.

Authored-by: bjornjorgensen 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 9758d55918dfec236e8ac9f1655a9ff0acd7156e)
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 3189 ++---
 dev/package.json  |3 +-
 2 files changed, 2229 insertions(+), 963 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index a57f45b..c2a61b3 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -1,979 +1,2244 @@
 {
-"requires": true,
-"lockfileVersion": 1,
-"dependencies": {
-"@babel/code-frame": {
-"version": "7.12.11",
-"resolved": 
"https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz;,
-"integrity": 
"sha512-Zt1yodBx1UcyiePMSkWnU4hPqhwq7hGi2nFL1LeA3EUl+q2LQx16MISgJ0+z7dnmgvP9QtIleuETGOiOH1RcIw==",
-"dev": true,
-"requires": {
-"@babel/highlight": "^7.10.4"
-}
-},
-"@babel/helper-validator-identifier": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz;,
-"integrity": 
"sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==",
-"dev": true
-},
-"@babel/highlight": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz;,
-"integrity": 
"sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==",
-"dev": true,
-"requires": {
-"@babel/helper-validator-identifier": "^7.14.0",
-"chalk": "^2.0.0",
-"js-tokens": "^4.0.0"
-},
-"dependencies": {
-"chalk": {
-"version": "2.4.2",
-"resolved": 
"https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz;,
-"integrity": 
"sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
-"dev": true,
-"requires": {
-"ansi-styles": "^3.2.1",
-"escape-string-regexp": "^1.0.5",
-"supports-color": "^5.3.0"
-}
-}
-}
-},
-"@eslint/eslintrc": {
-"version": "0.4.0",
-"resolved": 
"https://registry.npmjs.org/@eslint/eslintrc/-/eslintrc-0.4.0.tgz;,
-"integrity": 
"sha512-2ZPCc+uNbjV5ERJr+aKSPRwZgKd2z11x0EgLvb1PURmUrn9QNRXFqje0Ldq454PfAVyaJYyrDvvIKSFP4NnBog==",
-"dev": true,
-"requires": {
-"ajv": "^6.12.4",
-"debug": "^4.1.1",
-"espree": "^7.3.0",
-

[spark] branch master updated: [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

2022-02-24 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9758d55  [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 
in /dev
9758d55 is described below

commit 9758d55918dfec236e8ac9f1655a9ff0acd7156e
Author: bjornjorgensen 
AuthorDate: Fri Feb 25 11:43:36 2022 +0900

[SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

### What changes were proposed in this pull request?
Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev

### Why are the changes needed?

[CVE-2021-3807](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3807)

[Releases notes at github](https://github.com/chalk/ansi-regex/releases)

By upgrading ansi-regex from 5.0.0 to 5.0.1 we will resolve this issue.

### Does this PR introduce _any_ user-facing change?
Some users use remote security scanners and this is one of the issues that 
comes up. How this can do some damage with spark is highly uncertain. but let's 
remove the uncertainty that any user may have.

### How was this patch tested?
All test must pass.

Closes #35628 from bjornjorgensen/ansi-regex-from-5.0.0-to-5.0.1.

Authored-by: bjornjorgensen 
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 3189 ++---
 dev/package.json  |3 +-
 2 files changed, 2229 insertions(+), 963 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index a57f45b..c2a61b3 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -1,979 +1,2244 @@
 {
-"requires": true,
-"lockfileVersion": 1,
-"dependencies": {
-"@babel/code-frame": {
-"version": "7.12.11",
-"resolved": 
"https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz;,
-"integrity": 
"sha512-Zt1yodBx1UcyiePMSkWnU4hPqhwq7hGi2nFL1LeA3EUl+q2LQx16MISgJ0+z7dnmgvP9QtIleuETGOiOH1RcIw==",
-"dev": true,
-"requires": {
-"@babel/highlight": "^7.10.4"
-}
-},
-"@babel/helper-validator-identifier": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz;,
-"integrity": 
"sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==",
-"dev": true
-},
-"@babel/highlight": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz;,
-"integrity": 
"sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==",
-"dev": true,
-"requires": {
-"@babel/helper-validator-identifier": "^7.14.0",
-"chalk": "^2.0.0",
-"js-tokens": "^4.0.0"
-},
-"dependencies": {
-"chalk": {
-"version": "2.4.2",
-"resolved": 
"https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz;,
-"integrity": 
"sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
-"dev": true,
-"requires": {
-"ansi-styles": "^3.2.1",
-"escape-string-regexp": "^1.0.5",
-"supports-color": "^5.3.0"
-}
-}
-}
-},
-"@eslint/eslintrc": {
-"version": "0.4.0",
-"resolved": 
"https://registry.npmjs.org/@eslint/eslintrc/-/eslintrc-0.4.0.tgz;,
-"integrity": 
"sha512-2ZPCc+uNbjV5ERJr+aKSPRwZgKd2z11x0EgLvb1PURmUrn9QNRXFqje0Ldq454PfAVyaJYyrDvvIKSFP4NnBog==",
-"dev": true,
-"requires": {
-"ajv": "^6.12.4",
-"debug": "^4.1.1",
-"espree": "^7.3.0",
-"globals": "^12.1.0&

[spark] branch master updated (a103a49 -> 48b56c0)

2022-02-22 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a103a49  [SPARK-38279][TESTS][3.2] Pin MarkupSafe to 2.0.1 fix linter 
failure
 add 48b56c0  [SPARK-38278][PYTHON] Add SparkContext.addArchive in PySpark

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.rst |  1 +
 python/pyspark/context.py| 44 
 2 files changed, 45 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (3dea6c4 -> 0dde12f)

2022-02-15 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3dea6c4  [SPARK-38211][SQL][DOCS] Add SQL migration guide on restoring 
loose upcast from string to other types
 add 0dde12f  [SPARK-36808][BUILD][3.2] Upgrade Kafka to 2.8.1

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44

2022-02-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new adba516  [SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44
adba516 is described below

commit adba5165a56bd4e7a71fcad77c568c0cbc2e7f97
Author: Jack Richard Buggins 
AuthorDate: Wed Feb 9 02:28:03 2022 +0900

[SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44

### What changes were proposed in this pull request?

This pull request updates provides a minor update to the Jetty version from 
`9.4.43.v20210629` to `9.4.44.v20210927` which is required against branch-3.2 
to fully resolve https://issues.apache.org/jira/browse/SPARK-37934

### Why are the changes needed?

As discussed in https://github.com/apache/spark/pull/35338, DoS vector is 
available even within a private or restricted network. The below result is the 
output of a twistlock scan, which also detects this vulnerability.

```
Source: https://github.com/eclipse/jetty.project/issues/6973
CVE: PRISMA-2021-0182
Sev.: medium
Package Name: org.eclipse.jetty_jetty-server
Package Ver.: 9.4.43.v20210629
Status: fixed in 9.4.44
Description: org.eclipse.jetty_jetty-server package versions before 9.4.44 
are vulnerable to DoS (Denial of Service). Logback-access calls 
Request.getParameterNames() for request logging. That will force a request body 
read (if it hasn't been read before) per the servlet. This will now consume 
resources to read the request body content, which could easily be malicious (in 
size? in keys? etc), even though the application intentionally didn't read the 
request body.
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

* Core local
```
$ build/sbt
> project core
> test
```
* CI

Closes #35442 from JackBuggins/branch-3.2.

Authored-by: Jack Richard Buggins 
Signed-off-by: Kousuke Saruta 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index bc3f925..8af3d6a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -138,7 +138,7 @@
 10.14.2.0
 1.12.2
 1.6.13
-9.4.43.v20210629
+9.4.44.v20210927
 4.0.3
 0.10.0
 2.5.0

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3d736d9 -> 6115f58)

2022-02-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d736d9  [SPARK-37412][PYTHON][ML] Inline typehints for pyspark.ml.stat
 add 6115f58  [MINOR][SQL] Remove redundant array creation in UnsafeRow

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38021][BUILD] Upgrade dropwizard metrics from 4.2.2 to 4.2.7

2022-01-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a1b061d  [SPARK-38021][BUILD] Upgrade dropwizard metrics from 4.2.2 to 
4.2.7
a1b061d is described below

commit a1b061d7fc5427138bfaa9fe68d2748f8bf3907c
Author: yangjie01 
AuthorDate: Tue Jan 25 20:57:16 2022 +0900

[SPARK-38021][BUILD] Upgrade dropwizard metrics from 4.2.2 to 4.2.7

### What changes were proposed in this pull request?
This pr upgrade dropwizard metrics from 4.2.2 to 4.2.7.

### Why are the changes needed?
There are 5 versions after 4.2.2, the release notes as follows:

- https://github.com/dropwizard/metrics/releases/tag/v4.2.3
- https://github.com/dropwizard/metrics/releases/tag/v4.2.4
- https://github.com/dropwizard/metrics/releases/tag/v4.2.5
- https://github.com/dropwizard/metrics/releases/tag/v4.2.6
- https://github.com/dropwizard/metrics/releases/tag/v4.2.7

And after 4.2.5, dropwizard metrics supports [build with JDK 
17](https://github.com/dropwizard/metrics/pull/2180).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #35317 from LuciferYang/upgrade-metrics.

Authored-by: yangjie01 
Signed-off-by: Kousuke Saruta 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 10 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +-
 pom.xml   |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 5efdca9..8284237 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -195,11 +195,11 @@ 
logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
 macro-compat_2.12/1.1.1//macro-compat_2.12-1.1.1.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
-metrics-core/4.2.2//metrics-core-4.2.2.jar
-metrics-graphite/4.2.2//metrics-graphite-4.2.2.jar
-metrics-jmx/4.2.2//metrics-jmx-4.2.2.jar
-metrics-json/4.2.2//metrics-json-4.2.2.jar
-metrics-jvm/4.2.2//metrics-jvm-4.2.2.jar
+metrics-core/4.2.7//metrics-core-4.2.7.jar
+metrics-graphite/4.2.7//metrics-graphite-4.2.7.jar
+metrics-jmx/4.2.7//metrics-jmx-4.2.7.jar
+metrics-json/4.2.7//metrics-json-4.2.7.jar
+metrics-jvm/4.2.7//metrics-jvm-4.2.7.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.73.Final//netty-all-4.1.73.Final.jar
 netty-buffer/4.1.73.Final//netty-buffer-4.1.73.Final.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index a79a71b..f169277 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -181,11 +181,11 @@ 
logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
 macro-compat_2.12/1.1.1//macro-compat_2.12-1.1.1.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
-metrics-core/4.2.2//metrics-core-4.2.2.jar
-metrics-graphite/4.2.2//metrics-graphite-4.2.2.jar
-metrics-jmx/4.2.2//metrics-jmx-4.2.2.jar
-metrics-json/4.2.2//metrics-json-4.2.2.jar
-metrics-jvm/4.2.2//metrics-jvm-4.2.2.jar
+metrics-core/4.2.7//metrics-core-4.2.7.jar
+metrics-graphite/4.2.7//metrics-graphite-4.2.7.jar
+metrics-jmx/4.2.7//metrics-jmx-4.2.7.jar
+metrics-json/4.2.7//metrics-json-4.2.7.jar
+metrics-jvm/4.2.7//metrics-jvm-4.2.7.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.73.Final//netty-all-4.1.73.Final.jar
 netty-buffer/4.1.73.Final//netty-buffer-4.1.73.Final.jar
diff --git a/pom.xml b/pom.xml
index 5bae4d2..09577f2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -147,7 +147,7 @@
 If you changes codahale.metrics.version, you also need to change
 the link to metrics.dropwizard.io in docs/monitoring.md.
 -->
-4.2.2
+4.2.7
 1.11.0
 1.12.0
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-38017][SQL][DOCS] Fix the API doc for window to say it supports TimestampNTZType too as timeColumn

2022-01-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 263fe44  [SPARK-38017][SQL][DOCS] Fix the API doc for window to say it 
supports TimestampNTZType too as timeColumn
263fe44 is described below

commit 263fe44f8a9738fc8d7dcfbcc1c0c10c942146e3
Author: Kousuke Saruta 
AuthorDate: Tue Jan 25 20:44:06 2022 +0900

[SPARK-38017][SQL][DOCS] Fix the API doc for window to say it supports 
TimestampNTZType too as timeColumn

### What changes were proposed in this pull request?

This PR fixes the API docs for `window` to say it supports 
`TimestampNTZType` too as `timeColumn`.

### Why are the changes needed?

`window` function supports not only `TimestampType` but also 
`TimestampNTZType`.

### Does this PR introduce _any_ user-facing change?

Yes, but I don't think this change affects existing users.

### How was this patch tested?

Built the docs with the following commands.
```
bundle install
SKIP_RDOC=1 SKIP_SQLDOC=1 bundle exec jekyll build
```
Then, confirmed the built doc.

![window_timestampntz](https://user-images.githubusercontent.com/4736016/150927548-2b1bec61-a165-410d-b8b2-5cd33ed13a50.png)

![window_timestmapntz_python](https://user-images.githubusercontent.com/4736016/150927564-450da33b-540f-4b97-a0e3-cae7897d9ea4.png)

Closes #35313 from sarutak/window-timestampntz-doc.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 76f685d26dc1f0f4d92293cd370e58ee2fa68452)
Signed-off-by: Kousuke Saruta 
---
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index c7bc581..acde817 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2304,7 +2304,7 @@ def window(timeColumn, windowDuration, 
slideDuration=None, startTime=None):
 --
 timeColumn : :class:`~pyspark.sql.Column`
 The column or the expression to use as the timestamp for windowing by 
time.
-The time column must be of TimestampType.
+The time column must be of TimestampType or TimestampNTZType.
 windowDuration : str
 A string specifying the width of the window, e.g. `10 minutes`,
 `1 second`. Check `org.apache.spark.unsafe.types.CalendarInterval` for
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index a4c77b2..f4801ee 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3517,7 +3517,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param windowDuration A string specifying the width of the window, e.g. 
`10 minutes`,
*   `1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*   valid duration identifiers. Note that the duration 
is a fixed length of
@@ -3573,7 +3573,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param windowDuration A string specifying the width of the window, e.g. 
`10 minutes`,
*   `1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*   valid duration identifiers. Note that the duration 
is a fixed length of
@@ -3618,7 +3618,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param windowDuration A string specifying the width of the window, e.g. 
`10 minutes`,
*   `1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*   valid duration identifiers.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail

[spark] branch master updated: [SPARK-38017][SQL][DOCS] Fix the API doc for window to say it supports TimestampNTZType too as timeColumn

2022-01-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 76f685d  [SPARK-38017][SQL][DOCS] Fix the API doc for window to say it 
supports TimestampNTZType too as timeColumn
76f685d is described below

commit 76f685d26dc1f0f4d92293cd370e58ee2fa68452
Author: Kousuke Saruta 
AuthorDate: Tue Jan 25 20:44:06 2022 +0900

[SPARK-38017][SQL][DOCS] Fix the API doc for window to say it supports 
TimestampNTZType too as timeColumn

### What changes were proposed in this pull request?

This PR fixes the API docs for `window` to say it supports 
`TimestampNTZType` too as `timeColumn`.

### Why are the changes needed?

`window` function supports not only `TimestampType` but also 
`TimestampNTZType`.

### Does this PR introduce _any_ user-facing change?

Yes, but I don't think this change affects existing users.

### How was this patch tested?

Built the docs with the following commands.
```
bundle install
SKIP_RDOC=1 SKIP_SQLDOC=1 bundle exec jekyll build
```
Then, confirmed the built doc.

![window_timestampntz](https://user-images.githubusercontent.com/4736016/150927548-2b1bec61-a165-410d-b8b2-5cd33ed13a50.png)

![window_timestmapntz_python](https://user-images.githubusercontent.com/4736016/150927564-450da33b-540f-4b97-a0e3-cae7897d9ea4.png)

Closes #35313 from sarutak/window-timestampntz-doc.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index bfee994..2dfaec8 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2551,7 +2551,7 @@ def window(
 --
 timeColumn : :class:`~pyspark.sql.Column`
 The column or the expression to use as the timestamp for windowing by 
time.
-The time column must be of TimestampType.
+The time column must be of TimestampType or TimestampNTZType.
 windowDuration : str
 A string specifying the width of the window, e.g. `10 minutes`,
 `1 second`. Check `org.apache.spark.unsafe.types.CalendarInterval` for
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index f217dad..0db12a2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3621,7 +3621,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param windowDuration A string specifying the width of the window, e.g. 
`10 minutes`,
*   `1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*   valid duration identifiers. Note that the duration 
is a fixed length of
@@ -3677,7 +3677,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param windowDuration A string specifying the width of the window, e.g. 
`10 minutes`,
*   `1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*   valid duration identifiers. Note that the duration 
is a fixed length of
@@ -3722,7 +3722,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param windowDuration A string specifying the width of the window, e.g. 
`10 minutes`,
*   `1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*   valid duration identifiers.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38016][SQL][DOCS] Fix the API doc for session_window to say it supports TimestampNTZType too as timeColumn

2022-01-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 48a440f  [SPARK-38016][SQL][DOCS] Fix the API doc for session_window 
to say it supports TimestampNTZType too as timeColumn
48a440f is described below

commit 48a440fe1fc334134f42a726cc6fb3d98802e0fd
Author: Kousuke Saruta 
AuthorDate: Tue Jan 25 20:41:38 2022 +0900

[SPARK-38016][SQL][DOCS] Fix the API doc for session_window to say it 
supports TimestampNTZType too as timeColumn

### What changes were proposed in this pull request?

This PR fixes the API docs for `session_window` to say it supports 
`TimestampNTZType` too as `timeColumn`.

### Why are the changes needed?

As of Spark 3.3.0 (e858cd568a74123f7fd8fe4c3d2917a), `session_window` 
supports not only `TimestampType` but also `TimestampNTZType`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the docs with the following commands.
```
bundle install
SKIP_RDOC=1 SKIP_SQLDOC=1 bundle exec jekyll build
```
Then, confirmed the built doc.

![session_window_timestampntz](https://user-images.githubusercontent.com/4736016/150925544-7f9a2297-36c5-419a-b2b5-a8e43dfb50ff.png)

![session_window_timestampntz_python](https://user-images.githubusercontent.com/4736016/150925570-c8d59d1f-666a-49d9-a6e7-084d6e877871.png)

Closes #35312 from sarutak/sessionwindow-timestampntz-doc.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index e69c37d..bfee994 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2623,7 +2623,7 @@ def session_window(timeColumn: "ColumnOrName", 
gapDuration: Union[Column, str])
 --
 timeColumn : :class:`~pyspark.sql.Column` or str
 The column name or column to use as the timestamp for windowing by 
time.
-The time column must be of TimestampType.
+The time column must be of TimestampType or TimestampNTZType.
 gapDuration : :class:`~pyspark.sql.Column` or str
 A Python string literal or column specifying the timeout of the 
session. It could be
 static value, e.g. `10 minutes`, `1 second`, or an expression/UDF that 
specifies gap
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index ec28d8d..f217dad 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3750,7 +3750,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param gapDuration A string specifying the timeout of the session, e.g. 
`10 minutes`,
*`1 second`. Check 
`org.apache.spark.unsafe.types.CalendarInterval` for
*valid duration identifiers.
@@ -3787,7 +3787,7 @@ object functions {
* processing time.
*
* @param timeColumn The column or the expression to use as the timestamp 
for windowing by time.
-   *   The time column must be of TimestampType.
+   *   The time column must be of TimestampType or 
TimestampNTZType.
* @param gapDuration A column specifying the timeout of the session. It 
could be static value,
*e.g. `10 minutes`, `1 second`, or an expression/UDF 
that specifies gap
*duration dynamically based on the input row.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-37860][UI] Fix taskindex in the stage page task event timeline

2022-01-10 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 755d11d  [SPARK-37860][UI] Fix taskindex in the stage page task event 
timeline
755d11d is described below

commit 755d11d0d1479f5441c6ead2cc6142bab45d6e16
Author: stczwd 
AuthorDate: Tue Jan 11 15:23:12 2022 +0900

[SPARK-37860][UI] Fix taskindex in the stage page task event timeline

### What changes were proposed in this pull request?
This reverts commit 450b415028c3b00f3a002126cd11318d3932e28f.

### Why are the changes needed?
In #32888, shahidki31 change taskInfo.index to taskInfo.taskId. However, we 
generally use `index.attempt` or `taskId` to distinguish tasks within a stage, 
not `taskId.attempt`.
Thus #32888 was a wrong fix issue, we should revert it.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
origin test suites

Closes #35160 from stczwd/SPARK-37860.

Authored-by: stczwd 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 3d2fde5242c8989688c176b8ed5eb0bff5e1f17f)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index e9eb62e..ccaa70b 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -352,7 +352,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
|'content': '
+ |data-title="${s"Task " + index + " (attempt " + attempt + 
")"}
  |Status: ${taskInfo.status}
  |Launch Time: ${UIUtils.formatDate(new Date(launchTime))}
  |${

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-37860][UI] Fix taskindex in the stage page task event timeline

2022-01-10 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 830d5b6  [SPARK-37860][UI] Fix taskindex in the stage page task event 
timeline
830d5b6 is described below

commit 830d5b650ce9ac00f2a64bbf3e7fe9d31b02e51d
Author: stczwd 
AuthorDate: Tue Jan 11 15:23:12 2022 +0900

[SPARK-37860][UI] Fix taskindex in the stage page task event timeline

### What changes were proposed in this pull request?
This reverts commit 450b415028c3b00f3a002126cd11318d3932e28f.

### Why are the changes needed?
In #32888, shahidki31 change taskInfo.index to taskInfo.taskId. However, we 
generally use `index.attempt` or `taskId` to distinguish tasks within a stage, 
not `taskId.attempt`.
Thus #32888 was a wrong fix issue, we should revert it.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
origin test suites

Closes #35160 from stczwd/SPARK-37860.

Authored-by: stczwd 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 3d2fde5242c8989688c176b8ed5eb0bff5e1f17f)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 459e09a..47ba951 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -355,7 +355,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
|'content': '
+ |data-title="${s"Task " + index + " (attempt " + attempt + 
")"}
  |Status: ${taskInfo.status}
  |Launch Time: ${UIUtils.formatDate(new Date(launchTime))}
  |${

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37860][UI] Fix taskindex in the stage page task event timeline

2022-01-10 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new db1023c  [SPARK-37860][UI] Fix taskindex in the stage page task event 
timeline
db1023c is described below

commit db1023c728c5e0bdcd4ef457cf5f7ba4f13cb79d
Author: stczwd 
AuthorDate: Tue Jan 11 15:23:12 2022 +0900

[SPARK-37860][UI] Fix taskindex in the stage page task event timeline

### What changes were proposed in this pull request?
This reverts commit 450b415028c3b00f3a002126cd11318d3932e28f.

### Why are the changes needed?
In #32888, shahidki31 change taskInfo.index to taskInfo.taskId. However, we 
generally use `index.attempt` or `taskId` to distinguish tasks within a stage, 
not `taskId.attempt`.
Thus #32888 was a wrong fix issue, we should revert it.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
origin test suites

Closes #35160 from stczwd/SPARK-37860.

Authored-by: stczwd 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 3d2fde5242c8989688c176b8ed5eb0bff5e1f17f)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 81dfe83..777a6b0 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -355,7 +355,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
|'content': '
+ |data-title="${s"Task " + index + " (attempt " + attempt + 
")"}
  |Status: ${taskInfo.status}
  |Launch Time: ${UIUtils.formatDate(new Date(launchTime))}
  |${

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7463564 -> 3d2fde5)

2022-01-10 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7463564  [SPARK-37847][CORE][SHUFFLE] PushBlockStreamCallback#isStale 
should check null to avoid NPE
 add 3d2fde5  [SPARK-37860][UI] Fix taskindex in the stage page task event 
timeline

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (98e1c77 -> 3b88bc8)

2022-01-04 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 98e1c77  [SPARK-37803][SQL] Add ORC read benchmarks for structs
 add 3b88bc8  [SPARK-37792][CORE] Fix the check of custom configuration in 
SparkShellLoggingFilter

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/Logging.scala  | 19 --
 .../org/apache/spark/internal/LoggingSuite.scala   | 23 +++---
 .../scala/org/apache/spark/repl/ReplSuite.scala| 19 --
 3 files changed, 42 insertions(+), 19 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37302][BUILD][FOLLOWUP] Extract the versions of dependencies accurately

2021-12-23 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 0888622  [SPARK-37302][BUILD][FOLLOWUP] Extract the versions of 
dependencies accurately
0888622 is described below

commit 08886223c6373cc7c7e132bfb58f1536e70286ef
Author: Kousuke Saruta 
AuthorDate: Fri Dec 24 11:29:37 2021 +0900

[SPARK-37302][BUILD][FOLLOWUP] Extract the versions of dependencies 
accurately

### What changes were proposed in this pull request?

This PR changes `dev/test-dependencies.sh` to extract the versions of 
dependencies accurately.
In the current implementation, the versions are extracted like as follows.

```
GUAVA_VERSION=`build/mvn help:evaluate -Dexpression=guava.version -q 
-DforceStdout`
```

But, if the output of the `mvn` command includes not only the version but 
also other messages like warnings, a following command referring the version 
will fail.

```
build/mvn dependency:get -Dartifact=com.google.guava:guava:${GUAVA_VERSION} 
-q
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.1.1:get (default-cli) on 
project spark-parent_2.12: Couldn't download artifact: 
org.eclipse.aether.resolution.DependencyResolutionException: 
com.google.guava:guava:jar:Falling was not found in 
https://maven-central.storage-download.googleapis.com/maven2/ during a previous 
attempt. This failure was cached in the local repository and resolution is not 
reattempted until the update interval of gcs-maven-cent [...]
```

Actually, this causes the recent linter failure.
https://github.com/apache/spark/runs/4623297663?check_suite_focus=true

### Why are the changes needed?

To recover the CI.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually run `dev/test-dependencies.sh`.

Closes #35006 from sarutak/followup-SPARK-37302.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit dd0decff5f1e95cedd8fe83de7e4449be57cb31c)
Signed-off-by: Kousuke Saruta 
---
 dev/test-dependencies.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh
index 363ba1a..39a11e7 100755
--- a/dev/test-dependencies.sh
+++ b/dev/test-dependencies.sh
@@ -48,9 +48,9 @@ OLD_VERSION=$($MVN -q \
 --non-recursive \
 org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E 
'[0-9]+\.[0-9]+\.[0-9]+')
 # dependency:get for guava and jetty-io are workaround for SPARK-37302.
-GUAVA_VERSION=`build/mvn help:evaluate -Dexpression=guava.version -q 
-DforceStdout`
+GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q 
-DforceStdout | grep -E "^[0-9.]+$")
 build/mvn dependency:get -Dartifact=com.google.guava:guava:${GUAVA_VERSION} -q
-JETTY_VERSION=`build/mvn help:evaluate -Dexpression=jetty.version -q 
-DforceStdout`
+JETTY_VERSION=$(build/mvn help:evaluate -Dexpression=jetty.version -q 
-DforceStdout | grep -E "^[0-9.]+v[0-9]+")
 build/mvn dependency:get 
-Dartifact=org.eclipse.jetty:jetty-io:${JETTY_VERSION} -q
 if [ $? != 0 ]; then
 echo -e "Error while getting version string from Maven:\n$OLD_VERSION"

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37302][BUILD][FOLLOWUP] Extract the versions of dependencies accurately

2021-12-23 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dd0decf  [SPARK-37302][BUILD][FOLLOWUP] Extract the versions of 
dependencies accurately
dd0decf is described below

commit dd0decff5f1e95cedd8fe83de7e4449be57cb31c
Author: Kousuke Saruta 
AuthorDate: Fri Dec 24 11:29:37 2021 +0900

[SPARK-37302][BUILD][FOLLOWUP] Extract the versions of dependencies 
accurately

### What changes were proposed in this pull request?

This PR changes `dev/test-dependencies.sh` to extract the versions of 
dependencies accurately.
In the current implementation, the versions are extracted like as follows.

```
GUAVA_VERSION=`build/mvn help:evaluate -Dexpression=guava.version -q 
-DforceStdout`
```

But, if the output of the `mvn` command includes not only the version but 
also other messages like warnings, a following command referring the version 
will fail.

```
build/mvn dependency:get -Dartifact=com.google.guava:guava:${GUAVA_VERSION} 
-q
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.1.1:get (default-cli) on 
project spark-parent_2.12: Couldn't download artifact: 
org.eclipse.aether.resolution.DependencyResolutionException: 
com.google.guava:guava:jar:Falling was not found in 
https://maven-central.storage-download.googleapis.com/maven2/ during a previous 
attempt. This failure was cached in the local repository and resolution is not 
reattempted until the update interval of gcs-maven-cent [...]
```

Actually, this causes the recent linter failure.
https://github.com/apache/spark/runs/4623297663?check_suite_focus=true

### Why are the changes needed?

To recover the CI.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually run `dev/test-dependencies.sh`.

Closes #35006 from sarutak/followup-SPARK-37302.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 dev/test-dependencies.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh
index cf05126..2268a26 100755
--- a/dev/test-dependencies.sh
+++ b/dev/test-dependencies.sh
@@ -50,9 +50,9 @@ OLD_VERSION=$($MVN -q \
 --non-recursive \
 org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E 
'[0-9]+\.[0-9]+\.[0-9]+')
 # dependency:get for guava and jetty-io are workaround for SPARK-37302.
-GUAVA_VERSION=`build/mvn help:evaluate -Dexpression=guava.version -q 
-DforceStdout`
+GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q 
-DforceStdout | grep -E "^[0-9.]+$")
 build/mvn dependency:get -Dartifact=com.google.guava:guava:${GUAVA_VERSION} -q
-JETTY_VERSION=`build/mvn help:evaluate -Dexpression=jetty.version -q 
-DforceStdout`
+JETTY_VERSION=$(build/mvn help:evaluate -Dexpression=jetty.version -q 
-DforceStdout | grep -E "^[0-9.]+v[0-9]+")
 build/mvn dependency:get 
-Dartifact=org.eclipse.jetty:jetty-io:${JETTY_VERSION} -q
 if [ $? != 0 ]; then
 echo -e "Error while getting version string from Maven:\n$OLD_VERSION"

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37391][SQL] JdbcConnectionProvider tells if it modifies security context

2021-12-23 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6cc4c90  [SPARK-37391][SQL] JdbcConnectionProvider tells if it 
modifies security context
6cc4c90 is described below

commit 6cc4c90cbc09a7729f9c40f440fcdda83e3d8648
Author: Danny Guinther 
AuthorDate: Fri Dec 24 10:07:16 2021 +0900

[SPARK-37391][SQL] JdbcConnectionProvider tells if it modifies security 
context

Augments the JdbcConnectionProvider API such that a provider can indicate 
that it will need to modify the global security configuration when establishing 
a connection, and as such, if access to the global security configuration 
should be synchronized to prevent races.

### What changes were proposed in this pull request?
As suggested by gaborgsomogyi 
[here](https://github.com/apache/spark/pull/29024/files#r755788709), augments 
the `JdbcConnectionProvider` API to include a `modifiesSecurityContext` method 
that can be used by `ConnectionProvider` to determine when 
`SecurityConfigurationLock.synchronized` is required to avoid race conditions 
when establishing a JDBC connection.

### Why are the changes needed?
Provides a path forward for working around a significant bottleneck 
introduced by synchronizing `SecurityConfigurationLock` every time a connection 
is established. The synchronization isn't always needed and it should be at the 
discretion of the `JdbcConnectionProvider` to determine when locking is 
necessary. See [SPARK-37391](https://issues.apache.org/jira/browse/SPARK-37391) 
or [this thread](https://github.com/apache/spark/pull/29024/files#r754441783).

### Does this PR introduce _any_ user-facing change?
Any existing implementations of `JdbcConnectionProvider` will need to add a 
definition of `modifiesSecurityContext`. I'm also open to adding a default 
implementation, but it seemed to me that requiring an explicit implementation 
of the method was preferable.

A drop-in implementation that would continue the existing behavior is:
```scala
override def modifiesSecurityContext(
  driver: Driver,
  options: Map[String, String]
): Boolean = true
```

### How was this patch tested?
Unit tests, but I also plan to run a real workflow once I get the initial 
thumbs up on this implementation.

Closes #34745 from tdg5/SPARK-37391-opt-in-security-configuration-sync.

Authored-by: Danny Guinther 
Signed-off-by: Kousuke Saruta 
---
 .../sql/jdbc/ExampleJdbcConnectionProvider.scala   |  5 ++
 project/MimaExcludes.scala |  5 +-
 .../jdbc/connection/BasicConnectionProvider.scala  |  8 
 .../jdbc/connection/ConnectionProvider.scala   | 22 +
 .../spark/sql/jdbc/JdbcConnectionProvider.scala| 19 +++-
 .../main/scala/org/apache/spark/sql/jdbc/README.md |  5 +-
 .../jdbc/connection/ConnectionProviderSuite.scala  | 55 ++
 .../IntentionallyFaultyConnectionProvider.scala|  4 ++
 8 files changed, 109 insertions(+), 14 deletions(-)

diff --git 
a/examples/src/main/scala/org/apache/spark/examples/sql/jdbc/ExampleJdbcConnectionProvider.scala
 
b/examples/src/main/scala/org/apache/spark/examples/sql/jdbc/ExampleJdbcConnectionProvider.scala
index 6d275d4..c63467d 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/sql/jdbc/ExampleJdbcConnectionProvider.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/sql/jdbc/ExampleJdbcConnectionProvider.scala
@@ -30,4 +30,9 @@ class ExampleJdbcConnectionProvider extends 
JdbcConnectionProvider with Logging
   override def canHandle(driver: Driver, options: Map[String, String]): 
Boolean = false
 
   override def getConnection(driver: Driver, options: Map[String, String]): 
Connection = null
+
+  override def modifiesSecurityContext(
+driver: Driver,
+options: Map[String, String]
+  ): Boolean = false
 }
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 75fa001..6cf639f 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -40,7 +40,10 @@ object MimaExcludes {
 // The followings are necessary for Scala 2.13.
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.*"),
 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.*"),
-
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend$Arguments$")
+
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend$Arguments$"),
+
+// [SPARK-37391][SQL] JdbcConnectionProvider tells if it modifies security 
context
+
ProblemFilt

[spark] branch master updated (ae8940c -> d270d40)

2021-12-14 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae8940c  [SPARK-37310][SQL] Migrate ALTER NAMESPACE ... SET PROPERTIES 
to use V2 command by default
 add d270d40  [SPARK-37635][SQL] SHOW TBLPROPERTIES should print the fully 
qualified table name

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala  | 2 +-
 .../spark/sql/execution/datasources/v2/ShowTablePropertiesExec.scala   | 3 ++-
 .../src/test/resources/sql-tests/results/show-tblproperties.sql.out| 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6a59fba -> ae8940c)

2021-12-14 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6a59fba  [SPARK-37575][SQL] null values should be saved as nothing 
rather than quoted empty Strings "" by default settings
 add ae8940c  [SPARK-37310][SQL] Migrate ALTER NAMESPACE ... SET PROPERTIES 
to use V2 command by default

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-12-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4158d35  [SPARK-36038][CORE] Speculation metrics summary at stage level
4158d35 is described below

commit 4158d3544030058a23c7741d4283213a56ae7cc7
Author: Thejdeep Gudivada 
AuthorDate: Mon Dec 13 09:58:27 2021 +0900

[SPARK-36038][CORE] Speculation metrics summary at stage level

### What changes were proposed in this pull request?
Currently there are no speculation metrics available for Spark either at 
application/job/stage level. This PR is to add some basic speculation metrics 
for a stage when speculation execution is enabled.

This is similar to the existing stage level metrics tracking numTotal 
(total number of speculated tasks), numCompleted (total number of successful 
speculated tasks), numFailed (total number of failed speculated tasks), 
numKilled (total number of killed speculated tasks) etc.

With this new set of metrics, it helps further understanding speculative 
execution feature in the context of the application and also helps in further 
tuning the speculative execution config knobs.

 ### Why are the changes needed?
Additional metrics for speculative execution.

 ### Does this PR introduce _any_ user-facing change?
Yes, Stages Page in SHS UI will have an additional table for speculation 
metrics, if present.

 ### How was this patch tested?
Unit tests added and also tested on our internal platform.

Absence of speculation metrics :
![Screen Shot 2021-11-15 at 10 12 23 
AM](https://user-images.githubusercontent.com/1708757/141836847-b0768265-35eb-4963-b83a-8b785318e82d.png)

Presence of speculation metrics :
![Screen Shot 2021-11-15 at 10 11 52 
AM](https://user-images.githubusercontent.com/1708757/141836945-1078bfcb-1c07-44db-9f33-5dbd09ece6f2.png)

Closes #34607 from thejdeep/SPARK-36038.

Lead-authored-by: Thejdeep Gudivada 
Co-authored-by: Ron Hu 
Co-authored-by: Venkata krishnan Sowrirajan 
Signed-off-by: Kousuke Saruta 
---
 .../org/apache/spark/ui/static/stagepage.js|  32 ++
 .../spark/ui/static/stagespage-template.html   |  15 +
 .../resources/org/apache/spark/ui/static/webui.css |  10 +
 .../apache/spark/status/AppStatusListener.scala|  14 +
 .../org/apache/spark/status/AppStatusStore.scala   |  11 +
 .../scala/org/apache/spark/status/LiveEntity.scala |  26 ++
 .../scala/org/apache/spark/status/api/v1/api.scala |   8 +
 .../scala/org/apache/spark/status/storeTypes.scala |  12 +
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   |   1 +
 .../application_list_json_expectation.json |  15 +
 .../completed_app_list_json_expectation.json   |  15 +
 .../limit_app_list_json_expectation.json   |  30 +-
 .../minDate_app_list_json_expectation.json |  15 +
 .../minEndDate_app_list_json_expectation.json  |  15 +
 ...stage_with_speculation_summary_expectation.json | 507 +
 .../spark-events/application_1628109047826_1317105 |  52 +++
 .../spark/deploy/history/HistoryServerSuite.scala  |   5 +-
 .../spark/status/AppStatusListenerSuite.scala  |  10 +
 .../apache/spark/status/AppStatusStoreSuite.scala  |  51 ++-
 .../scala/org/apache/spark/ui/StagePageSuite.scala |   1 +
 dev/.rat-excludes  |   3 +-
 21 files changed, 830 insertions(+), 18 deletions(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
index 584e1a7..a5955f3 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
@@ -652,6 +652,38 @@ $(document).ready(function () {
 
executorSummaryTableSelector.column(14).visible(dataToShow.showBytesSpilledData);
   });
 
+// Prepare data for speculation metrics
+$("#speculationSummaryTitle").hide();
+$("#speculationSummary").hide();
+var speculationSummaryInfo = responseBody.speculationSummary;
+var speculationData;
+if(speculationSummaryInfo) {
+  speculationData = [[
+speculationSummaryInfo.numTasks,
+speculationSummaryInfo.numActiveTasks,
+speculationSummaryInfo.numCompletedTasks,
+speculationSummaryInfo.numFailedTasks,
+speculationSummaryInfo.numKilledTasks
+  ]];
+  if (speculationSummaryInfo.numTasks > 0) {
+// Show speculationSummary if there is atleast one speculated task 
that ran
+$("#speculationSummaryTitle").show();
+$("#speculationSummary").show();
+  }
+}
+var specu

[spark] branch master updated (119da4e -> 7692773)

2021-12-11 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 119da4e  [SPARK-37615][BUILD] Upgrade SBT to 1.5.6
 add 7692773  [SPARK-37615][BUILD][FOLLOWUP] Upgrade SBT to 1.5.6 in 
AppVeyor

No new revisions were added by this update.

Summary of changes:
 dev/appveyor-install-dependencies.ps1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37586][SQL] Add the `mode` and `padding` args to `aes_encrypt()`/`aes_decrypt()`

2021-12-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f6e439  [SPARK-37586][SQL] Add the `mode` and `padding` args to 
`aes_encrypt()`/`aes_decrypt()`
8f6e439 is described below

commit 8f6e439068281633acefb895f8c4bd9203868c24
Author: Max Gekk 
AuthorDate: Thu Dec 9 14:36:47 2021 +0900

[SPARK-37586][SQL] Add the `mode` and `padding` args to 
`aes_encrypt()`/`aes_decrypt()`

### What changes were proposed in this pull request?
In the PR, I propose to add new optional arguments to the `aes_encrypt()` 
and `aes_decrypt()` functions with default values:
1. `mode` - specifies which block cipher mode should be used to 
encrypt/decrypt messages. Only one valid value is `ECB` at the moment.
2. `padding` - specifies how to pad messages whose length is not a multiple 
of the block size. Currently, only `PKCS` is supported.

In this way, when an user doesn't pass `mode`/`padding` to the functions, 
the functions apply AES encryption/decryption in the `ECB` mode with the 
`PKCS5Padding` padding.

### Why are the changes needed?
1. For now, `aes_encrypt()` and `aes_decrypt()` rely on the jvm's 
configuration regarding which cipher mode to support, this is problematic as it 
is not fixed across versions and systems. By using default constants for new 
arguments, we can guarantee the same behaviour across all supported platforms.
2. We can consider new arguments as new point of extension in the current 
implementation of AES algorithm in Spark SQL. In the future in OSS or in a 
private Spark fork, devs can implement other modes (and paddings) like GCM. 
Other systems have already supported different AES modes, see:
   1. Snowflake: 
https://docs.snowflake.com/en/sql-reference/functions/encrypt.html
   2. BigQuery: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/aead-encryption-concepts#block_cipher_modes
   3. MySQL: 
https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-encrypt
   4. Hive: 
https://cwiki.apache.org/confluence/display/hive/languagemanual+udf
   5. PostgreSQL: 
https://www.postgresql.org/docs/12/pgcrypto.html#id-1.11.7.34.8

### Does this PR introduce _any_ user-facing change?
No. This PR just extends existing APIs.

### How was this patch tested?
By running new checks:
```
$ build/sbt "test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite"
$ build/sbt "sql/testOnly *ExpressionsSchemaSuite"
```

Closes #34837 from MaxGekk/aes-gsm-mode.

Authored-by: Max Gekk 
Signed-off-by: Kousuke Saruta 
---
 .../catalyst/expressions/ExpressionImplUtils.java  | 24 +--
 .../spark/sql/catalyst/expressions/misc.scala  | 78 +-
 .../spark/sql/errors/QueryExecutionErrors.scala| 10 ++-
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 16 +
 5 files changed, 104 insertions(+), 26 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
index 9afa5a6..83205c1 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions;
 
 import org.apache.spark.sql.errors.QueryExecutionErrors;
+import org.apache.spark.unsafe.types.UTF8String;
 
 import javax.crypto.Cipher;
 import javax.crypto.spec.SecretKeySpec;
@@ -27,19 +28,28 @@ import java.security.GeneralSecurityException;
  * An utility class for constructing expressions.
  */
 public class ExpressionImplUtils {
-  public static byte[] aesEncrypt(byte[] input, byte[] key) {
-return aesInternal(input, key, Cipher.ENCRYPT_MODE);
+  public static byte[] aesEncrypt(byte[] input, byte[] key, UTF8String mode, 
UTF8String padding) {
+return aesInternal(input, key, mode.toString(), padding.toString(), 
Cipher.ENCRYPT_MODE);
   }
 
-  public static byte[] aesDecrypt(byte[] input, byte[] key) {
-return aesInternal(input, key, Cipher.DECRYPT_MODE);
+  public static byte[] aesDecrypt(byte[] input, byte[] key, UTF8String mode, 
UTF8String padding) {
+return aesInternal(input, key, mode.toString(), padding.toString(), 
Cipher.DECRYPT_MODE);
   }
 
-  private static byte[] aesInternal(byte[] input, byte[] key, int mode) {
+  private static byte[] aesInternal(
+  byte[] input,
+  byte[] key,
+  String mode,
+  Str

[spark] branch master updated (d50d464 -> cd4476f)

2021-12-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d50d464  [SPARK-37555][SQL] spark-sql should pass last unclosed 
comment to backend
 add cd4476f  [SPARK-37469][WEBUI] unified shuffle read block time to 
shuffle read fetch wait time in StagePage

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ui/static/stagepage.js|  16 
 .../spark/ui/static/stagespage-template.html   |   2 +-
 .../resources/org/apache/spark/ui/static/webui.css |   4 ++--
 .../org/apache/spark/status/AppStatusStore.scala   |   2 +-
 .../scala/org/apache/spark/status/storeTypes.scala |   5 +++--
 .../main/scala/org/apache/spark/ui/ToolTips.scala  |   2 +-
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |   9 +
 .../spark/ui/jobs/TaskDetailsClassNames.scala  |   2 +-
 docs/img/AllStagesPageDetail6.png  | Bin 106909 -> 163423 bytes
 docs/web-ui.md |   2 +-
 10 files changed, 23 insertions(+), 21 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (406455d -> a85c51f)

2021-11-21 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 406455d  Revert "[SPARK-36231][PYTHON] Support arithmetic operations 
of decimal(nan) series"
 add a85c51f  [SPARK-37354][K8S][TESTS] Make the Java version installed on 
the container image used by the K8s integration tests with SBT configurable

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 1 +
 1 file changed, 1 insertion(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35672][FOLLOWUP][TESTS] Add more exclusion rules to MimaExcludes.scala for Scala 2.13

2021-11-19 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f20898  [SPARK-35672][FOLLOWUP][TESTS] Add more exclusion rules to 
MimaExcludes.scala for Scala 2.13
4f20898 is described below

commit 4f2089899dd7f21ba41c9ccfc0453a93afa1e7eb
Author: Kousuke Saruta 
AuthorDate: Fri Nov 19 20:33:23 2021 +0900

[SPARK-35672][FOLLOWUP][TESTS] Add more exclusion rules to 
MimaExcludes.scala for Scala 2.13

### What changes were proposed in this pull request?

This PR adds more MiMa exclusion rules for Scala 2.13.
#34649 partially resolved the compatibility issue but additional 3 
compatibility problems are raised.

```
$ build/sbt clean
$ dev/change-scala-version.sh 2.13
$ build/sbt -Pscala-2.13 clean
$ dev/mima

...
[error] spark-core: Failed binary compatibility check against 
org.apache.spark:spark-core_2.13:3.2.0! Found 3 potential problems (filtered 
910)
[error]  * synthetic method 
copy$default$8()scala.collection.mutable.ListBuffer in class 
org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments has a 
different result type in current version, where it is scala.Option rather than 
scala.collection.mutable.ListBuffer
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.copy$default$8")
[error]  * synthetic method copy$default$9()scala.Option in class 
org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments has a 
different result type in current version, where it is Int rather than 
scala.Option
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.copy$default$9")
[error]  * the type hierarchy of object 
org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments is different 
in current version. Missing types {scala.runtime.AbstractFunction10}
[error]filter with: 
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend$Arguments$")
...
```

### Why are the changes needed?

To keep the build stable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Confirmed MiMa passed.
```
$ build/sbt clean
$ dev/change-scala-version.sh 2.13
$ build/sbt -Pscala-2.13 clean
$ dev/mima

Closes #34664 from sarutak/followup-SPARK-35672-mima-take2.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 project/MimaExcludes.scala | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 15df3d4..75fa001 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -37,8 +37,10 @@ object MimaExcludes {
   // Exclude rules for 3.3.x from 3.2.0
   lazy val v33excludes = v32excludes ++ Seq(
 // [SPARK-35672][CORE][YARN] Pass user classpath entries to executors 
using config instead of command line
-// This is necessary for Scala 2.13.
+// The followings are necessary for Scala 2.13.
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.*"),
+
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.*"),
+
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend$Arguments$")
   )
 
   // Exclude rules for 3.2.x from 3.1.1

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (edbc7cf -> bb9e1d9)

2021-11-14 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from edbc7cf  [SPARK-36533][SS][FOLLOWUP] Support Trigger.AvailableNow in 
PySpark
 add bb9e1d9  [SPARK-37319][K8S] Support K8s image building with Java 17

No new revisions were added by this update.

Summary of changes:
 bin/docker-image-tool.sh  | 11 ---
 .../main/dockerfiles/spark/{Dockerfile => Dockerfile.java17}  |  9 -
 2 files changed, 12 insertions(+), 8 deletions(-)
 copy 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/{Dockerfile => 
Dockerfile.java17} (92%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` as ExtendedLevelDBTest

2021-11-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b45a08  [SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` 
as ExtendedLevelDBTest
8b45a08 is described below

commit 8b45a08b763e9ee6c75b039893af3de5e5167643
Author: Dongjoon Hyun 
AuthorDate: Sat Nov 13 15:21:59 2021 +0900

[SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` as 
ExtendedLevelDBTest

### What changes were proposed in this pull request?

This PR is a follow-up of #34548. This is missed due to `-Pyarn` profile.

### Why are the changes needed?

This is required to pass `yarn` module on Apple Silicon.

**BEFORE**
```
$ build/sbt "yarn/test"
...
[info] YarnShuffleServiceSuite:
[info] org.apache.spark.network.yarn.YarnShuffleServiceSuite *** ABORTED 
*** (20 milliseconds)
[info]   java.lang.UnsatisfiedLinkError: Could not load library. Reasons: 
[no leveldbjni64-1.8
...
```

**AFTER**
```
$ build/sbt "yarn/test" -Pyarn 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
...
[info] Run completed in 4 minutes, 57 seconds.
[info] Total number of tests run: 135
[info] Suites: completed 18, aborted 0
[info] Tests: succeeded 135, failed 0, canceled 1, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 319 s (05:19), completed Nov 12, 2021, 4:53:14 PM
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A manual test on Apple Silicon.
```
$ build/sbt "yarn/test" -Pyarn 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
```

Closes #34576 from dongjoon-hyun/SPARK-37282-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 .../scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala   | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
index b2025aa..38d2247 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
@@ -46,8 +46,10 @@ import org.apache.spark.internal.config._
 import org.apache.spark.network.shuffle.{NoOpMergedShuffleFileManager, 
RemoteBlockPushResolver, ShuffleTestAccessor}
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo
 import org.apache.spark.network.util.TransportConf
+import org.apache.spark.tags.ExtendedLevelDBTest
 import org.apache.spark.util.Utils
 
+@ExtendedLevelDBTest
 class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterEach {
   private[yarn] var yarnConfig: YarnConfiguration = null
   private[yarn] val SORT_MANAGER = 
"org.apache.spark.shuffle.sort.SortShuffleManager"

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and `.rat-excludes`

2021-11-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d0eb621  [SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and 
`.rat-excludes`
d0eb621 is described below

commit d0eb62179822c82596c4feaa412f2fdf5b83c02a
Author: Dongjoon Hyun 
AuthorDate: Sat Nov 13 14:43:52 2021 +0900

[SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and `.rat-excludes`

### What changes were proposed in this pull request?

To support Java 8/11/17 test more easily, this PR aims to add 
`.java-version` to `.gitignore` and `.rat-excludes`.

### Why are the changes needed?

When we use `jenv`, `dev/check-license` and `dev/run-tests` fails.

```

Running Apache RAT checks

Could not find Apache license headers in the following files:
 !? /Users/dongjoon/APACHE/spark-merge/.java-version
[error] running /Users/dongjoon/APACHE/spark-merge/dev/check-license ; 
received return code 1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
$ jenv local 17
$ dev/check-license
```

Closes #34577 from dongjoon-hyun/SPARK-37312.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 .gitignore| 1 +
 dev/.rat-excludes | 1 +
 2 files changed, 2 insertions(+)

diff --git a/.gitignore b/.gitignore
index 1a7881a..560265e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -7,6 +7,7 @@
 *.pyo
 *.swp
 *~
+.java-version
 .DS_Store
 .bsp/
 .cache
diff --git a/dev/.rat-excludes b/dev/.rat-excludes
index a35d4ce..7932c5d 100644
--- a/dev/.rat-excludes
+++ b/dev/.rat-excludes
@@ -10,6 +10,7 @@ cache
 .generated-mima-member-excludes
 .rat-excludes
 .*md
+.java-version
 derby.log
 licenses/*
 licenses-binary/*

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc80c84 -> b89f415)

2021-11-09 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc80c84  [SPARK-36575][CORE] Should ignore task finished event if its 
task set is gone in TaskSchedulerImpl.handleSuccessfulTask
 add b89f415  [SPARK-37264][BUILD] Exclude `hadoop-client-api` transitive 
dependency from `orc-core`

No new revisions were added by this update.

Summary of changes:
 pom.xml | 4 
 1 file changed, 4 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][DOCS] Fix typos in python user guide and "the the" in the whole codebase

2021-11-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b6ac331  [MINOR][DOCS] Fix typos in python user guide and "the the" in 
the whole codebase
b6ac331 is described below

commit b6ac3311b70d7fdb373e88b8617c74dda63e1c8f
Author: sudoliyang 
AuthorDate: Tue Nov 9 13:54:27 2021 +0900

[MINOR][DOCS] Fix typos in python user guide and "the the" in the whole 
codebase

### What changes were proposed in this pull request?

Fix typos in python user guide and "the the" in the whole codebase.

### Why are the changes needed?

Improve readability.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested by checking dictionary.

Closes #34524 from sudoliyang/master.

Authored-by: sudoliyang 
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala  | 2 +-
 python/docs/source/user_guide/pandas_on_spark/typehints.rst | 2 +-
 python/docs/source/user_guide/python_packaging.rst  | 2 +-
 python/pyspark/rdd.py   | 2 +-
 .../sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala | 2 +-
 .../test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala| 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index ae50a45..55db73a 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -610,7 +610,7 @@ private[spark] class TaskSchedulerImpl(
   taskSet.getCompletelyExcludedTaskIfAny(hostToExecutors).foreach { 
taskIndex =>
   // If the taskSet is unschedulable we try to find an existing 
idle excluded
   // executor and kill the idle executor and kick off an 
abortTimer which if it doesn't
-  // schedule a task within the the timeout will abort the taskSet 
if we were unable to
+  // schedule a task within the timeout will abort the taskSet if 
we were unable to
   // schedule any task from the taskSet.
   // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
   // task basis.
diff --git a/python/docs/source/user_guide/pandas_on_spark/typehints.rst 
b/python/docs/source/user_guide/pandas_on_spark/typehints.rst
index 72519fc..fda400d 100644
--- a/python/docs/source/user_guide/pandas_on_spark/typehints.rst
+++ b/python/docs/source/user_guide/pandas_on_spark/typehints.rst
@@ -91,7 +91,7 @@ plans to move gradually towards using pandas instances only 
as the stability bec
 Type Hinting with Names
 ---
 
-This apporach is to overcome the limitations in the existing type
+This approach is to overcome the limitations in the existing type
 hinting especially for DataFrame. When you use a DataFrame as the return type 
hint, for example,
 ``DataFrame[int, int]``, there is no way to specify the names of each Series. 
In the old way, pandas API on Spark just generates
 the column names as ``c#`` and this easily leads users to lose or forgot the 
Series mappings. See the example below:
diff --git a/python/docs/source/user_guide/python_packaging.rst 
b/python/docs/source/user_guide/python_packaging.rst
index 6409c5f..8a60177 100644
--- a/python/docs/source/user_guide/python_packaging.rst
+++ b/python/docs/source/user_guide/python_packaging.rst
@@ -249,5 +249,5 @@ For the interactive pyspark shell, the commands are almost 
the same:
 
 An end-to-end Docker example for deploying a standalone PySpark with 
``SparkSession.builder`` and PEX
 can be found `here 
<https://github.com/criteo/cluster-pack/blob/master/examples/spark-with-S3/README.md>`_
-- it uses cluster-pack, a library on top of PEX that automatizes the the 
intermediate step of having
+- it uses cluster-pack, a library on top of PEX that automatizes the 
intermediate step of having
 to create & upload the PEX manually.
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 2f0db7f..6942634 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -2797,7 +2797,7 @@ class RDD(object):
 Returns
 ---
 :py:class:`pyspark.resource.ResourceProfile`
-The the user specified profile or None if none were specified
+The user specified profile or None if none were specified
 
 Notes
 -
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTra

[spark] branch master updated: [SPARK-36895][SQL][FOLLOWUP] Use property to specify index type

2021-11-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d8a4a8c  [SPARK-36895][SQL][FOLLOWUP] Use property to specify index 
type
d8a4a8c is described below

commit d8a4a8c629db6ae8081477e58fdbf20983b48a39
Author: Huaxin Gao 
AuthorDate: Tue Nov 9 04:21:29 2021 +0900

[SPARK-36895][SQL][FOLLOWUP] Use property to specify index type

### What changes were proposed in this pull request?
use property to specify index type

### Why are the changes needed?
fix scala doc in https://github.com/apache/spark/pull/34486 and resubmit

### Does this PR introduce _any_ user-facing change?
Yes
```
  void createIndex(String indexName,
  String indexType,
  NamedReference[] columns,
  Map> columnsProperties,
  Map properties)
```
changed to
```
createIndex(String indexName,
  NamedReference[] columns,
  Map> columnsProperties,
  Map properties
```

### How was this patch tested?
new test

Closes #34523 from huaxingao/newDelete.

Authored-by: Huaxin Gao 
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 67 --
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 82 +-
 .../sql/connector/catalog/index/SupportsIndex.java |  8 ++-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  3 +-
 .../execution/datasources/v2/CreateIndexExec.scala |  9 ++-
 .../execution/datasources/v2/jdbc/JDBCTable.scala  |  3 +-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  2 -
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 27 ---
 8 files changed, 45 insertions(+), 156 deletions(-)

diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
index d77dcb4..592f7d6 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
@@ -18,16 +18,11 @@
 package org.apache.spark.sql.jdbc.v2
 
 import java.sql.{Connection, SQLFeatureNotSupportedException}
-import java.util
 
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.analysis.{IndexAlreadyExistsException, 
NoSuchIndexException}
-import org.apache.spark.sql.connector.catalog.{Catalogs, Identifier, 
TableCatalog}
-import org.apache.spark.sql.connector.catalog.index.SupportsIndex
-import org.apache.spark.sql.connector.expressions.{FieldReference, 
NamedReference}
 import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
 import org.apache.spark.sql.jdbc.{DatabaseOnDocker, DockerJDBCIntegrationSuite}
 import org.apache.spark.sql.types._
@@ -122,66 +117,4 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite with V2JDBCTest {
   }
 
   override def supportsIndex: Boolean = true
-
-  override def testIndexProperties(jdbcTable: SupportsIndex): Unit = {
-val properties = new util.HashMap[String, String]();
-properties.put("KEY_BLOCK_SIZE", "10")
-properties.put("COMMENT", "'this is a comment'")
-// MySQL doesn't allow property set on individual column, so use empty 
Array for
-// column properties
-jdbcTable.createIndex("i1", "BTREE", Array(FieldReference("col1")),
-  new util.HashMap[NamedReference, util.Map[String, String]](), properties)
-
-var index = jdbcTable.listIndexes()
-// The index property size is actually 1. Even though the index is created
-// with properties "KEY_BLOCK_SIZE", "10" and "COMMENT", "'this is a 
comment'", when
-// retrieving index using `SHOW INDEXES`, MySQL only returns `COMMENT`.
-assert(index(0).properties.size == 1)
-assert(index(0).properties.get("COMMENT").equals("this is a comment"))
-  }
-
-  override def testIndexUsingSQL(tbl: String): Unit = {
-val loaded = Catalogs.load("mysql", conf)
-val jdbcTable = loaded.asInstanceOf[TableCatalog]
-  .loadTable(Identifier.of(Array.empty[String], "new_table"))
-  .asInstanceOf[SupportsIndex]
-assert(jdbcTable.indexExists("i1") == false)
-assert(jdbcTable.indexExists("i2") == false)
-
-val indexType = "DUMMY"
-var m = intercept[UnsupportedOperationException] {
-  sql(s"CREATE index i1 ON $catalogName.new_table USING DUMMY (col1)")
-}.ge

[spark] branch master updated (a47d380 -> 61e9789)

2021-11-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a47d380  [SPARK-37241][BUILD] Upgrade Jackson to 2.13.0
 add 61e9789  [SPARK-37240][SQL] Handle ANSI intervals by 
`ColumnVectorUtils.populate()`

No new revisions were added by this update.

Summary of changes:
 .../execution/vectorized/ColumnVectorUtils.java|  5 ++-
 .../spark/sql/sources/PartitionedWriteSuite.scala  | 43 --
 2 files changed, 26 insertions(+), 22 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (06bdea0 -> 597cee6)

2021-11-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 06bdea0  [MINOR][INFRA] Explicitly specify Java version for lint test 
on GA
 add 597cee6  Revert "[SPARK-36895][SQL][FOLLOWUP] Use property to specify 
index type"

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 67 ++
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 82 +-
 .../sql/connector/catalog/index/SupportsIndex.java |  8 +--
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  3 +-
 .../execution/datasources/v2/CreateIndexExec.scala |  9 +--
 .../execution/datasources/v2/jdbc/JDBCTable.scala  |  3 +-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  1 +
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 27 +++
 8 files changed, 155 insertions(+), 45 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: Revert "[SPARK-36038][CORE] Speculation metrics summary at stage level"

2021-11-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f16d38  Revert "[SPARK-36038][CORE] Speculation metrics summary at 
stage level"
4f16d38 is described below

commit 4f16d3844532d9434d93c817e4fae15d10378af7
Author: Kousuke Saruta 
AuthorDate: Mon Nov 8 17:26:00 2021 +0900

Revert "[SPARK-36038][CORE] Speculation metrics summary at stage level"

This reverts commit 73747ecb970595d49c478b0eb65f5132c8b0bf02.
See the 
[comment](https://github.com/apache/spark/pull/33253#issuecomment-962913353).

    Closes #34518 from sarutak/revert-SPARK-36038.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../org/apache/spark/ui/static/stagepage.js|  29 --
 .../spark/ui/static/stagespage-template.html   |  15 -
 .../resources/org/apache/spark/ui/static/webui.css |  10 -
 .../apache/spark/status/AppStatusListener.scala|  19 -
 .../org/apache/spark/status/AppStatusStore.scala   |  11 -
 .../scala/org/apache/spark/status/LiveEntity.scala |  26 --
 .../scala/org/apache/spark/status/api/v1/api.scala |   8 -
 .../scala/org/apache/spark/status/storeTypes.scala |  14 -
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   |   1 -
 .../application_list_json_expectation.json |  15 -
 .../completed_app_list_json_expectation.json   |  15 -
 .../limit_app_list_json_expectation.json   |  30 +-
 .../minDate_app_list_json_expectation.json |  15 -
 .../minEndDate_app_list_json_expectation.json  |  15 -
 ...stage_with_speculation_summary_expectation.json | 507 -
 .../spark-events/application_1628109047826_1317105 |  52 ---
 .../spark/deploy/history/HistoryServerSuite.scala  |   5 +-
 .../spark/status/AppStatusListenerSuite.scala  |  10 -
 .../apache/spark/status/AppStatusStoreSuite.scala  |  57 +--
 .../scala/org/apache/spark/ui/StagePageSuite.scala |   1 -
 dev/.rat-excludes  |   3 +-
 21 files changed, 18 insertions(+), 840 deletions(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
index 595635a..db1a148 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
@@ -652,35 +652,6 @@ $(document).ready(function () {
 
executorSummaryTableSelector.column(14).visible(dataToShow.showBytesSpilledData);
   });
 
-// Prepare data for speculation metrics
-$("#speculationSummaryTitle").hide()
-$("#speculationSummary").hide()
-var speculationSummaryInfo = responseBody.speculationSummary;
-var speculationData = [[
-  speculationSummaryInfo.numTasks,
-  speculationSummaryInfo.numActiveTasks,
-  speculationSummaryInfo.numCompletedTasks,
-  speculationSummaryInfo.numFailedTasks,
-  speculationSummaryInfo.numKilledTasks
-]];
-if (speculationSummaryInfo.numTasks > 0) {
-  // Show speculationSummary if there is atleast one speculated task 
ran
-  $("#speculationSummaryTitle").show()
-  $("#speculationSummary").show()
-}
-var speculationMetricsTableConf = {
-  "data": speculationData,
-  "paging": false,
-  "searching": false,
-  "order": [[0, "asc"]],
-  "bSort": false,
-  "bAutoWidth": false,
-  "oLanguage": {
-"sEmptyTable": "No speculation metrics yet"
-  }
-}
-$("#speculation-metrics-table").DataTable(speculationMetricsTableConf);
-
 // prepare data for accumulatorUpdates
 var accumulatorTable = 
responseBody.accumulatorUpdates.filter(accumUpdate =>
   !(accumUpdate.name).toString().includes("internal."));
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html 
b/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html
index 8c47e5a..98e714f 100644
--- 
a/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html
+++ 
b/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html
@@ -31,21 +31,6 @@ limitations under the License.
 
 
 
-Speculation 
Summary
-
-
-
-Total
-Active
-Complete
-Failed
-Killed
-
-
-
-
-
-
 
 
 Aggregated Metrics by Executor
diff --git a/core/src/main/resources/org/apache/spark/

[spark] branch branch-3.2 updated (e55bab5 -> 90b7ee0)

2021-11-07 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e55bab5  [SPARK-37214][SQL] Fail query analysis earlier with invalid 
identifiers
 add 90b7ee0  [SPARK-37238][BUILD][3.2] Upgrade ORC to 1.6.12

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 6 +++---
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 6 +++---
 pom.xml | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7ef6a2e -> e29c4e1)

2021-11-07 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7ef6a2e  [SPARK-37231][SQL] Dynamic writes/reads of ANSI interval 
partitions
 add e29c4e1  [SPARK-37211][INFRA] Added descriptions and an image to the 
guide for enabling GitHub Actions in notify_test_workflow.yml

No new revisions were added by this update.

Summary of changes:
 .github/workflows/images/workflow-enable-button.png | Bin 0 -> 79807 bytes
 .github/workflows/notify_test_workflow.yml  |  10 --
 2 files changed, 8 insertions(+), 2 deletions(-)
 create mode 100644 .github/workflows/images/workflow-enable-button.png

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8ab9d63 -> 7ef6a2e)

2021-11-07 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8ab9d63  [SPARK-37214][SQL] Fail query analysis earlier with invalid 
identifiers
 add 7ef6a2e  [SPARK-37231][SQL] Dynamic writes/reads of ANSI interval 
partitions

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/PartitioningUtils.scala  |  2 ++
 .../spark/sql/sources/PartitionedWriteSuite.scala  | 40 ++
 2 files changed, 36 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37108][R] Expose make_date expression in R

2021-11-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5f997c7  [SPARK-37108][R] Expose make_date expression in R
5f997c7 is described below

commit 5f997c78c83551942b6c5a8ec6344547b86ae68a
Author: Leona Yoda 
AuthorDate: Thu Nov 4 12:25:12 2021 +0900

[SPARK-37108][R] Expose make_date expression in R

### What changes were proposed in this pull request?

Expose `make_date` API on  SparkR

### Why are the changes needed?

`make_date` APIs on Scala and PySpark were added by 
[SPARK-34356](https://github.com/apache/spark/pull/34356), this PR aims to 
cover the API on SparkR.

### Does this PR introduce _any_ user-facing change?

Yes, users can call the API by SparkR

### How was this patch tested?

unit tests.

Closes #34480 from yoda-mon/make-date-r.

Authored-by: Leona Yoda 
Signed-off-by: Kousuke Saruta 
---
 R/pkg/NAMESPACE   |  1 +
 R/pkg/R/functions.R   | 26 ++
 R/pkg/R/generics.R|  4 
 R/pkg/tests/fulltests/test_sparkSQL.R | 14 ++
 4 files changed, 45 insertions(+)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 10bb02a..6e0557c 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -343,6 +343,7 @@ exportMethods("%<=>%",
   "lower",
   "lpad",
   "ltrim",
+  "make_date",
   "map_concat",
   "map_entries",
   "map_filter",
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index fdbf48b..48d4fe8 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -41,6 +41,8 @@ NULL
 #' @param x Column to compute on. In \code{window}, it must be a time Column of
 #'  \code{TimestampType}. This is not used with \code{current_date} and
 #'  \code{current_timestamp}
+#' @param y Column to compute on.
+#' @param z Column to compute on.
 #' @param format The format for the given dates or timestamps in Column 
\code{x}. See the
 #'   format used in the following methods:
 #'   \itemize{
@@ -1467,6 +1469,30 @@ setMethod("ltrim",
   })
 
 #' @details
+#' \code{make_date}: Create date from year, month and day fields.
+#'
+#' @rdname column_datetime_functions
+#' @aliases make_date make_date,Column-method
+#' @note make_date since 3.3.0
+#' @examples
+#'
+#' \dontrun{
+#' df <- createDataFrame(
+#'   list(list(2021, 10, 22), list(2021, 13, 1),
+#'list(2021, 2, 29), list(2020, 2, 29)),
+#'   list("year", "month", "day")
+#' )
+#' tmp <- head(select(df, make_date(df$year, df$month, df$day)))
+#' head(tmp)}
+setMethod("make_date",
+  signature(x = "Column", y = "Column", z = "Column"),
+  function(x, y, z) {
+jc <- callJStatic("org.apache.spark.sql.functions", "make_date",
+  x@jc, y@jc, z@jc)
+column(jc)
+  })
+
+#' @details
 #' \code{max}: Returns the maximum value of the expression in a group.
 #'
 #' @rdname column_aggregate_functions
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index af19e72..5fe2ec6 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -1158,6 +1158,10 @@ setGeneric("lpad", function(x, len, pad) { 
standardGeneric("lpad") })
 #' @name NULL
 setGeneric("ltrim", function(x, trimString) { standardGeneric("ltrim") })
 
+#' @rdname column_datetime_functions
+#' @name NULL
+setGeneric("make_date", function(x, y, z) { standardGeneric("make_date") })
+
 #' @rdname column_collection_functions
 #' @name NULL
 setGeneric("map_concat", function(x, ...) { standardGeneric("map_concat") })
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index b6e02bb..0e46324e 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2050,6 +2050,20 @@ test_that("date functions on a DataFrame", {
   Sys.setenv(TZ = .originalTimeZone)
 })
 
+test_that("SPARK-37108: expose make_date expression in R", {
+  df <- createDataFrame(
+list(list(2021, 10, 22), list(2021, 13, 1),
+ list(2021, 2, 29), list(2020, 2, 29)),
+list("year", "month", "day")
+  )
+  expect <- createDataFrame(
+list(list(as.Date("2021-10-22")), NA, NA, list(as.Date("2020-02-29"))),
+list("make_date(year, month, day)")
+  )
+  actual <- select(df, make_date(df$year, df$month, df$day))
+

[spark] branch master updated: [SPARK-36894][SPARK-37077][PYTHON] Synchronize RDD.toDF annotations with SparkSession and SQLContext .createDataFrame variants

2021-11-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b874bf5  [SPARK-36894][SPARK-37077][PYTHON] Synchronize RDD.toDF 
annotations with SparkSession and SQLContext .createDataFrame variants
b874bf5 is described below

commit b874bf5dca4f1b7272f458350eb153e7b272f8c8
Author: zero323 
AuthorDate: Thu Nov 4 02:06:48 2021 +0900

[SPARK-36894][SPARK-37077][PYTHON] Synchronize RDD.toDF annotations with 
SparkSession and SQLContext .createDataFrame variants

### What changes were proposed in this pull request?

This pull request synchronizes `RDD.toDF` annotations with 
`SparkSession.createDataFrame` and `SQLContext.createDataFrame` variants.

Additionally, it fixes recent regression in `SQLContext.createDataFrame` 
(SPARK-37077), where `RDD` is no longer consider a valid input.

### Why are the changes needed?

- Adds support for providing `str` schema.
- Add supports for converting `RDDs` of "atomic" values, if schema is 
provided.

Additionally it introduces a `TypeVar` representing supported "atomic" 
values. This was done to avoid issue with manual data tests, where the following

```python
sc.parallelize([1]).toDF(schema=IntegerType())
```

results in

```
error: No overload variant of "toDF" of "RDD" matches argument type 
"IntegerType"  [call-overload]
note: Possible overload variants:
note: def toDF(self, schema: Union[List[str], Tuple[str, ...], None] = 
..., sampleRatio: Optional[float] = ...) -> DataFrame
note: def toDF(self, schema: Union[StructType, str, None] = ...) -> 
DataFrame
```

when `Union` type is used (this problem doesn't surface when non-self bound 
is used).

### Does this PR introduce _any_ user-facing change?

Type checker only.

Please note, that these annotations serve primarily to support 
documentation, as checks on `self` types are still very limited.

### How was this patch tested?

Existing tests and manual data tests.

__Note__:

Updated data tests to reflect new expected traceback, after reversal in 
#34477

Closes #34478 from zero323/SPARK-36894.

Authored-by: zero323 
Signed-off-by: Kousuke Saruta 
---
 python/pyspark/rdd.pyi   | 15 ++---
 python/pyspark/sql/_typing.pyi   | 11 +++
 python/pyspark/sql/context.py| 38 ++
 python/pyspark/sql/session.py| 40 +---
 python/pyspark/sql/tests/typing/test_session.yml |  8 ++---
 5 files changed, 71 insertions(+), 41 deletions(-)

diff --git a/python/pyspark/rdd.pyi b/python/pyspark/rdd.pyi
index a810a2c..84481d3 100644
--- a/python/pyspark/rdd.pyi
+++ b/python/pyspark/rdd.pyi
@@ -55,8 +55,8 @@ from pyspark.resource.requests import (  # noqa: F401
 from pyspark.resource.profile import ResourceProfile
 from pyspark.statcounter import StatCounter
 from pyspark.sql.dataframe import DataFrame
-from pyspark.sql.types import StructType
-from pyspark.sql._typing import RowLike
+from pyspark.sql.types import AtomicType, StructType
+from pyspark.sql._typing import AtomicValue, RowLike
 from py4j.java_gateway import JavaObject  # type: ignore[import]
 
 T = TypeVar("T")
@@ -445,11 +445,18 @@ class RDD(Generic[T]):
 @overload
 def toDF(
 self: RDD[RowLike],
-schema: Optional[List[str]] = ...,
+schema: Optional[Union[List[str], Tuple[str, ...]]] = ...,
 sampleRatio: Optional[float] = ...,
 ) -> DataFrame: ...
 @overload
-def toDF(self: RDD[RowLike], schema: Optional[StructType] = ...) -> 
DataFrame: ...
+def toDF(
+self: RDD[RowLike], schema: Optional[Union[StructType, str]] = ...
+) -> DataFrame: ...
+@overload
+def toDF(
+self: RDD[AtomicValue],
+schema: Union[AtomicType, str],
+) -> DataFrame: ...
 
 class RDDBarrier(Generic[T]):
 rdd: RDD[T]
diff --git a/python/pyspark/sql/_typing.pyi b/python/pyspark/sql/_typing.pyi
index 1a3bd8f..b6b4606 100644
--- a/python/pyspark/sql/_typing.pyi
+++ b/python/pyspark/sql/_typing.pyi
@@ -42,6 +42,17 @@ AtomicDataTypeOrString = Union[pyspark.sql.types.AtomicType, 
str]
 DataTypeOrString = Union[pyspark.sql.types.DataType, str]
 OptionalPrimitiveType = Optional[PrimitiveType]
 
+AtomicValue = TypeVar(
+"AtomicValue",
+datetime.datetime,
+datetime.date,
+decimal.Decimal,
+bool,
+str,
+int,
+float,
+)
+
 RowLike = TypeVar("RowLike", List[Any], Tuple[Any, ...], pyspark.sql.types.Row)
 
 class SupportsOpen(Protocol):
diff --git a/python/pyspark/sql/context.py b/python/pys

[spark] branch master updated: Revert "[SPARK-36894][SPARK-37077][PYTHON] Synchronize RDD.toDF annotations with SparkSession and SQLContext .createDataFrame variants."

2021-11-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8687138  Revert "[SPARK-36894][SPARK-37077][PYTHON] Synchronize 
RDD.toDF annotations with SparkSession and SQLContext .createDataFrame 
variants."
8687138 is described below

commit 86871386b063d8f7a8b5b42eb327a3900525af58
Author: Kousuke Saruta 
AuthorDate: Wed Nov 3 23:01:38 2021 +0900

Revert "[SPARK-36894][SPARK-37077][PYTHON] Synchronize RDD.toDF annotations 
with SparkSession and SQLContext .createDataFrame variants."

This reverts commit 855da09f02f3007a2c36e7a738d4dc81fd95569a.
See [this 
comment](https://github.com/apache/spark/pull/34146#issuecomment-959136935).

    Closes #34477 from sarutak/revert-SPARK-37077.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 python/pyspark/rdd.pyi | 15 ---
 python/pyspark/sql/_typing.pyi | 11 ---
 python/pyspark/sql/context.py  | 38 ++
 python/pyspark/sql/session.py  | 40 +++-
 4 files changed, 37 insertions(+), 67 deletions(-)

diff --git a/python/pyspark/rdd.pyi b/python/pyspark/rdd.pyi
index 84481d3..a810a2c 100644
--- a/python/pyspark/rdd.pyi
+++ b/python/pyspark/rdd.pyi
@@ -55,8 +55,8 @@ from pyspark.resource.requests import (  # noqa: F401
 from pyspark.resource.profile import ResourceProfile
 from pyspark.statcounter import StatCounter
 from pyspark.sql.dataframe import DataFrame
-from pyspark.sql.types import AtomicType, StructType
-from pyspark.sql._typing import AtomicValue, RowLike
+from pyspark.sql.types import StructType
+from pyspark.sql._typing import RowLike
 from py4j.java_gateway import JavaObject  # type: ignore[import]
 
 T = TypeVar("T")
@@ -445,18 +445,11 @@ class RDD(Generic[T]):
 @overload
 def toDF(
 self: RDD[RowLike],
-schema: Optional[Union[List[str], Tuple[str, ...]]] = ...,
+schema: Optional[List[str]] = ...,
 sampleRatio: Optional[float] = ...,
 ) -> DataFrame: ...
 @overload
-def toDF(
-self: RDD[RowLike], schema: Optional[Union[StructType, str]] = ...
-) -> DataFrame: ...
-@overload
-def toDF(
-self: RDD[AtomicValue],
-schema: Union[AtomicType, str],
-) -> DataFrame: ...
+def toDF(self: RDD[RowLike], schema: Optional[StructType] = ...) -> 
DataFrame: ...
 
 class RDDBarrier(Generic[T]):
 rdd: RDD[T]
diff --git a/python/pyspark/sql/_typing.pyi b/python/pyspark/sql/_typing.pyi
index b6b4606..1a3bd8f 100644
--- a/python/pyspark/sql/_typing.pyi
+++ b/python/pyspark/sql/_typing.pyi
@@ -42,17 +42,6 @@ AtomicDataTypeOrString = Union[pyspark.sql.types.AtomicType, 
str]
 DataTypeOrString = Union[pyspark.sql.types.DataType, str]
 OptionalPrimitiveType = Optional[PrimitiveType]
 
-AtomicValue = TypeVar(
-"AtomicValue",
-datetime.datetime,
-datetime.date,
-decimal.Decimal,
-bool,
-str,
-int,
-float,
-)
-
 RowLike = TypeVar("RowLike", List[Any], Tuple[Any, ...], pyspark.sql.types.Row)
 
 class SupportsOpen(Protocol):
diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py
index eba2087..7d27c55 100644
--- a/python/pyspark/sql/context.py
+++ b/python/pyspark/sql/context.py
@@ -48,11 +48,13 @@ from pyspark.conf import SparkConf
 
 if TYPE_CHECKING:
 from pyspark.sql._typing import (
-AtomicValue,
-RowLike,
 UserDefinedFunctionLike,
+RowLike,
+DateTimeLiteral,
+LiteralType,
+DecimalLiteral
 )
-from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+from pyspark.sql.pandas._typing import DataFrameLike
 
 __all__ = ["SQLContext", "HiveContext"]
 
@@ -321,8 +323,7 @@ class SQLContext(object):
 @overload
 def createDataFrame(
 self,
-data: Union["RDD[RowLike]", Iterable["RowLike"]],
-schema: Union[List[str], Tuple[str, ...]] = ...,
+data: Iterable["RowLike"],
 samplingRatio: Optional[float] = ...,
 ) -> DataFrame:
 ...
@@ -330,9 +331,8 @@ class SQLContext(object):
 @overload
 def createDataFrame(
 self,
-data: Union["RDD[RowLike]", Iterable["RowLike"]],
-schema: Union[StructType, str],
-*,
+data: Iterable["RowLike"],
+schema: Union[List[str], Tuple[str, ...]] = ...,
 verifySchema: bool = ...,
 ) -> DataFrame:
 ...
@@ -340,10 +340,7 @@ class SQLContext(object):
 @overload
 def createDataFrame(
 self,
-data: Union[
-"RDD[AtomicValue]",
-Iterable["AtomicValue"

[spark] branch branch-3.2 updated: [MINOR][PYTHON][DOCS] Fix broken link in legacy Apache Arrow in PySpark page

2021-11-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new c2f147e  [MINOR][PYTHON][DOCS] Fix broken link in legacy Apache Arrow 
in PySpark page
c2f147e is described below

commit c2f147eff8e3e353cfb43f5d45f19f174fb26773
Author: Hyukjin Kwon 
AuthorDate: Wed Nov 3 20:54:50 2021 +0900

[MINOR][PYTHON][DOCS] Fix broken link in legacy Apache Arrow in PySpark page

### What changes were proposed in this pull request?

This PR proposes to fix the broken link in the legacy page. Currently it 
links to:

![Screen Shot 2021-11-03 at 6 34 32 
PM](https://user-images.githubusercontent.com/6477701/140037221-b7963e47-12f5-49f3-8290-8560c99c62c2.png)
![Screen Shot 2021-11-03 at 6 34 30 
PM](https://user-images.githubusercontent.com/6477701/140037225-c21070fc-907f-41bb-a421-747810ae5b0d.png)

It should link to:

![Screen Shot 2021-11-03 at 6 34 35 
PM](https://user-images.githubusercontent.com/6477701/140037246-dd14760f-5487-4b8b-b3f6-e9495f1d4ec9.png)
![Screen Shot 2021-11-03 at 6 34 38 
PM](https://user-images.githubusercontent.com/6477701/140037251-3f97e992-6660-4ce9-9c66-77855d3c0a64.png)

### Why are the changes needed?

For users to easily navigate from legacy page to newer page.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a bug in documentation.

### How was this patch tested?

Manually built the side and checked the link

Closes #34475 from HyukjinKwon/minor-doc-fix-py.

Authored-by: Hyukjin Kwon 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit ab7e5030b23ccb8ef6aa43645e909457f9d68ffa)
Signed-off-by: Kousuke Saruta 
---
 python/docs/source/user_guide/arrow_pandas.rst | 2 +-
 python/docs/source/user_guide/sql/arrow_pandas.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/docs/source/user_guide/arrow_pandas.rst 
b/python/docs/source/user_guide/arrow_pandas.rst
index c1b68a6..60c11b7 100644
--- a/python/docs/source/user_guide/arrow_pandas.rst
+++ b/python/docs/source/user_guide/arrow_pandas.rst
@@ -21,4 +21,4 @@
 Apache Arrow in PySpark
 ===
 
-This page has been moved to `Apache Arrow in PySpark 
<../sql/arrow_pandas.rst>`_.
+This page has been moved to `Apache Arrow in PySpark `_.
diff --git a/python/docs/source/user_guide/sql/arrow_pandas.rst 
b/python/docs/source/user_guide/sql/arrow_pandas.rst
index 1767624..78d3e7a 100644
--- a/python/docs/source/user_guide/sql/arrow_pandas.rst
+++ b/python/docs/source/user_guide/sql/arrow_pandas.rst
@@ -343,7 +343,7 @@ Supported SQL Types
 
 Currently, all Spark SQL data types are supported by Arrow-based conversion 
except
 :class:`ArrayType` of :class:`TimestampType`, and nested :class:`StructType`.
-:class: `MapType` is only supported when using PyArrow 2.0.0 and above.
+:class:`MapType` is only supported when using PyArrow 2.0.0 and above.
 
 Setting Arrow Batch Size
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9babf9a -> ab7e503)

2021-11-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9babf9a  [SPARK-37200][SQL] Support drop index for Data Source V2
 add ab7e503  [MINOR][PYTHON][DOCS] Fix broken link in legacy Apache Arrow 
in PySpark page

No new revisions were added by this update.

Summary of changes:
 python/docs/source/user_guide/arrow_pandas.rst | 2 +-
 python/docs/source/user_guide/sql/arrow_pandas.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (675071a -> 320fa07)

2021-11-01 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 675071a  [MINOR][DOCS] Corrected spacing in structured streaming 
programming
 add 320fa07  [SPARK-37159][SQL][TESTS] Change 
HiveExternalCatalogVersionsSuite to be able to test with Java 17

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala  | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11de0fd -> cf7fbc1)

2021-11-01 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11de0fd  [MINOR][DOCS] Add import for MultivariateGaussian to Docs
 add cf7fbc1  [SPARK-36554][SQL][PYTHON] Expose make_date expression in 
functions.scala

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst   |  1 +
 python/pyspark/sql/functions.py| 29 ++
 python/pyspark/sql/tests/test_functions.py | 10 +++-
 .../scala/org/apache/spark/sql/functions.scala |  9 +++
 4 files changed, 48 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (40f1494 -> 81aa514)

2021-10-19 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 40f1494  [SPARK-37041][SQL] Backport HIVE-15025: Secure-Socket-Layer 
(SSL) support for HMS
 add 81aa514  [SPARK-37059][PYTHON][TESTS] Ensure the sort order of the 
output in the PySpark doctests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py| 20 ++--
 python/pyspark/sql/functions.py |  4 ++--
 python/run-tests.py |  2 +-
 3 files changed, 13 insertions(+), 13 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1c61d90 -> 838a9d9)

2021-10-14 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1c61d90  [SPARK-35973][SQL] Add command `SHOW CATALOGS`
 add 838a9d9  [SPARK-36922][SQL] The SIGN/SIGNUM functions should support 
ANSI intervals

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/mathExpressions.scala |  7 +++
 .../expressions/MathExpressionsSuite.scala | 21 +
 .../test/resources/sql-tests/inputs/interval.sql   |  6 +++
 .../sql-tests/results/ansi/interval.sql.out| 50 +-
 .../resources/sql-tests/results/interval.sql.out   | 50 +-
 5 files changed, 132 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12

2021-10-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a91b9a  [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12
3a91b9a is described below

commit 3a91b9ac598abcb69703d2cd0247b5e378be58c0
Author: Kousuke Saruta 
AuthorDate: Wed Oct 13 09:18:22 2021 +0900

[SPARK-36981][BUILD] Upgrade joda-time to 2.10.12

### What changes were proposed in this pull request?

This PR upgrades `joda-time` from `2.10.10` to `2.10.12`.

### Why are the changes needed?

`2.10.12` supports an updated TZDB.

[diff](https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R1037)
https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CIs.

Closes #34253 from sarutak/upgrade-joda-2.10.12.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 pom.xml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 
b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
index 94a4758..d37b38b 100644
--- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
@@ -148,7 +148,7 @@ jetty-util/6.1.26//jetty-util-6.1.26.jar
 jetty-util/9.4.43.v20210629//jetty-util-9.4.43.v20210629.jar
 jetty/6.1.26//jetty-6.1.26.jar
 jline/2.14.6//jline-2.14.6.jar
-joda-time/2.10.10//joda-time-2.10.10.jar
+joda-time/2.10.12//joda-time-2.10.12.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 
b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index 091f399..3040ffe 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -136,7 +136,7 @@ jettison/1.1//jettison-1.1.jar
 jetty-util-ajax/9.4.43.v20210629//jetty-util-ajax-9.4.43.v20210629.jar
 jetty-util/9.4.43.v20210629//jetty-util-9.4.43.v20210629.jar
 jline/2.14.6//jline-2.14.6.jar
-joda-time/2.10.10//joda-time-2.10.10.jar
+joda-time/2.10.12//joda-time-2.10.12.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/pom.xml b/pom.xml
index 6225fc0..2c46c52 100644
--- a/pom.xml
+++ b/pom.xml
@@ -184,7 +184,7 @@
 14.0.1
 3.0.16
 2.34
-2.10.10
+2.10.12
 3.5.2
 3.0.0
 0.12.0

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (36b3bbc0 -> b9a8165)

2021-10-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 36b3bbc0 [SPARK-36979][SQL] Add RewriteLateralSubquery rule into 
nonExcludableRules
 add b9a8165  [SPARK-36972][PYTHON] Add max_by/min_by API to PySpark

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst |  2 +
 python/pyspark/sql/functions.py  | 72 
 2 files changed, 74 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aed977c -> 2953d4f)

2021-10-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aed977c  [SPARK-36919][SQL] Make BadRecordException fields transient
 add 2953d4f  [SPARK-36751][PYTHON][DOCS][FOLLOW-UP] Fix unexpected section 
title for Examples in docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/functions.py | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-10-01 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73747ec  [SPARK-36038][CORE] Speculation metrics summary at stage level
73747ec is described below

commit 73747ecb970595d49c478b0eb65f5132c8b0bf02
Author: Venkata krishnan Sowrirajan 
AuthorDate: Fri Oct 1 16:59:29 2021 +0900

[SPARK-36038][CORE] Speculation metrics summary at stage level

### What changes were proposed in this pull request?

Currently there are no speculation metrics available for Spark either at 
application/job/stage level. This PR is to add some basic speculation metrics 
for a stage when speculation execution is enabled.

This is similar to the existing stage level metrics tracking numTotal 
(total number of speculated tasks), numCompleted (total number of successful 
speculated tasks), numFailed (total number of failed speculated tasks), 
numKilled (total number of killed speculated tasks) etc.

With this new set of metrics, it helps further understanding speculative 
execution feature in the context of the application and also helps in further 
tuning the speculative execution config knobs.

Screenshot of Spark UI with speculation summary:
![Screen Shot 2021-09-22 at 12 12 20 
PM](https://user-images.githubusercontent.com/8871522/135321311-db7699ad-f1ae-4729-afea-d1e2c4e86103.png)

Screenshot of Spark UI with API output:
![Screen Shot 2021-09-22 at 12 10 37 
PM](https://user-images.githubusercontent.com/8871522/135321486-4dbb7a67-5580-47f8-bccf-81c758c2e988.png)

### Why are the changes needed?

Additional metrics for speculative execution.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests added and also deployed in our internal platform for quite some 
time now.

Lead-authored by: Venkata krishnan Sowrirajan 
Co-authored by: Ron Hu 
Co-authored by: Thejdeep Gudivada 

Closes #33253 from venkata91/speculation-metrics.

Authored-by: Venkata krishnan Sowrirajan 
Signed-off-by: Kousuke Saruta 
---
 .../org/apache/spark/ui/static/stagepage.js|  29 ++
 .../spark/ui/static/stagespage-template.html   |  15 +
 .../resources/org/apache/spark/ui/static/webui.css |  10 +
 .../apache/spark/status/AppStatusListener.scala|  19 +
 .../org/apache/spark/status/AppStatusStore.scala   |  11 +
 .../scala/org/apache/spark/status/LiveEntity.scala |  26 ++
 .../scala/org/apache/spark/status/api/v1/api.scala |   8 +
 .../scala/org/apache/spark/status/storeTypes.scala |  14 +
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   |   1 +
 .../application_list_json_expectation.json |  15 +
 .../completed_app_list_json_expectation.json   |  15 +
 .../limit_app_list_json_expectation.json   |  30 +-
 .../minDate_app_list_json_expectation.json |  15 +
 .../minEndDate_app_list_json_expectation.json  |  15 +
 ...stage_with_speculation_summary_expectation.json | 507 +
 .../spark-events/application_1628109047826_1317105 |  52 +++
 .../spark/deploy/history/HistoryServerSuite.scala  |   5 +-
 .../spark/status/AppStatusListenerSuite.scala  |  10 +
 .../apache/spark/status/AppStatusStoreSuite.scala  |  57 ++-
 .../scala/org/apache/spark/ui/StagePageSuite.scala |   1 +
 dev/.rat-excludes  |   3 +-
 21 files changed, 840 insertions(+), 18 deletions(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
index db1a148..595635a 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
@@ -652,6 +652,35 @@ $(document).ready(function () {
 
executorSummaryTableSelector.column(14).visible(dataToShow.showBytesSpilledData);
   });
 
+// Prepare data for speculation metrics
+$("#speculationSummaryTitle").hide()
+$("#speculationSummary").hide()
+var speculationSummaryInfo = responseBody.speculationSummary;
+var speculationData = [[
+  speculationSummaryInfo.numTasks,
+  speculationSummaryInfo.numActiveTasks,
+  speculationSummaryInfo.numCompletedTasks,
+  speculationSummaryInfo.numFailedTasks,
+  speculationSummaryInfo.numKilledTasks
+]];
+if (speculationSummaryInfo.numTasks > 0) {
+  // Show speculationSummary if there is atleast one speculated task 
ran
+  $("#speculationSummaryTitle").show()
+  $("#speculationSummary").show()
+}
+var speculationMetricsTableConf = {
+  "data": speculationData,
+

[spark] branch master updated: [SPARK-36899][R] Support ILIKE API on R

2021-09-29 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17e3ca6  [SPARK-36899][R] Support ILIKE API on R
17e3ca6 is described below

commit 17e3ca6df5eb4b7b74cd8d04868da39eb0137826
Author: Leona Yoda 
AuthorDate: Thu Sep 30 14:43:09 2021 +0900

[SPARK-36899][R] Support ILIKE API on R

### What changes were proposed in this pull request?

Support ILIKE (case insensitive LIKE) API on R.

### Why are the changes needed?

ILIKE statement on SQL interface is supported by SPARK-36674.
This PR will support R API for it.

### Does this PR introduce _any_ user-facing change?

Yes. Users can call ilike from R.

### How was this patch tested?

Unit tests.

Closes #34152 from yoda-mon/r-ilike.

Authored-by: Leona Yoda 
Signed-off-by: Kousuke Saruta 
---
 R/pkg/NAMESPACE   | 1 +
 R/pkg/R/column.R  | 2 +-
 R/pkg/R/generics.R| 3 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 2 ++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 5de7aeb..11403f6 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -316,6 +316,7 @@ exportMethods("%<=>%",
   "hour",
   "hypot",
   "ifelse",
+  "ilike",
   "initcap",
   "input_file_name",
   "instr",
diff --git a/R/pkg/R/column.R b/R/pkg/R/column.R
index 9fa117c..f1fd30e 100644
--- a/R/pkg/R/column.R
+++ b/R/pkg/R/column.R
@@ -72,7 +72,7 @@ column_functions1 <- c(
   "desc", "desc_nulls_first", "desc_nulls_last",
   "isNaN", "isNull", "isNotNull"
 )
-column_functions2 <- c("like", "rlike", "getField", "getItem", "contains")
+column_functions2 <- c("like", "rlike", "ilike", "getField", "getItem", 
"contains")
 
 createOperator <- function(op) {
   setMethod(op,
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index 9da818b..ad29a70 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -725,6 +725,9 @@ setGeneric("like", function(x, ...) { 
standardGeneric("like") })
 #' @rdname columnfunctions
 setGeneric("rlike", function(x, ...) { standardGeneric("rlike") })
 
+#' @rdname columnfunctions
+setGeneric("ilike", function(x, ...) { standardGeneric("ilike") })
+
 #' @rdname startsWith
 setGeneric("startsWith", function(x, prefix) { standardGeneric("startsWith") })
 
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index bd5c250..1d8ac2b 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2130,6 +2130,8 @@ test_that("higher order functions", {
   expr("transform(xs, (x, i) -> CASE WHEN ((i % 2.0) = 0.0) THEN x ELSE (- 
x) END)"),
 array_exists("vs", function(v) rlike(v, "FAILED")) ==
   expr("exists(vs, v -> (v RLIKE 'FAILED'))"),
+array_exists("vs", function(v) ilike(v, "failed")) ==
+  expr("exists(vs, v -> (v ILIKE 'failed'))"),
 array_forall("xs", function(x) x > 0) ==
   expr("forall(xs, x -> x > 0)"),
 array_filter("xs", function(x, i) x > 0 | i %% 2 == 0) ==

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36882][PYTHON] Support ILIKE API on Python

2021-09-29 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ca1c09d  [SPARK-36882][PYTHON] Support ILIKE API on Python
ca1c09d is described below

commit ca1c09d88c21d0f8664df8e852778f864f130d94
Author: Leona Yoda 
AuthorDate: Wed Sep 29 15:04:03 2021 +0900

[SPARK-36882][PYTHON] Support ILIKE API on Python

### What changes were proposed in this pull request?

Support ILIKE (case insensitive LIKE) API on Python.

### Why are the changes needed?

ILIKE statement on SQL interface is supported by SPARK-36674.
This PR will support Python API for it.

### Does this PR introduce _any_ user-facing change?

Yes. Users can call `ilike` from Python.

### How was this patch tested?

Unit tests.

Closes #34135 from yoda-mon/python-ilike.

Authored-by: Leona Yoda 
Signed-off-by: Kousuke Saruta 
---
 python/docs/source/reference/pyspark.sql.rst |  1 +
 python/pyspark/sql/column.py | 21 +
 python/pyspark/sql/tests/test_column.py  |  2 +-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/python/docs/source/reference/pyspark.sql.rst 
b/python/docs/source/reference/pyspark.sql.rst
index f5a8357..0fd2c4d 100644
--- a/python/docs/source/reference/pyspark.sql.rst
+++ b/python/docs/source/reference/pyspark.sql.rst
@@ -259,6 +259,7 @@ Column APIs
 Column.eqNullSafe
 Column.getField
 Column.getItem
+Column.ilike
 Column.isNotNull
 Column.isNull
 Column.isin
diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index 9046e7f..c46b0eb 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -507,6 +507,26 @@ class Column(object):
 >>> df.filter(df.name.like('Al%')).collect()
 [Row(age=2, name='Alice')]
 """
+_ilike_doc = """
+SQL ILIKE expression (case insensitive LIKE). Returns a boolean 
:class:`Column`
+based on a case insensitive match.
+
+.. versionadded:: 3.3.0
+
+Parameters
+--
+other : str
+a SQL LIKE pattern
+
+See Also
+
+pyspark.sql.Column.rlike
+
+Examples
+
+>>> df.filter(df.name.ilike('%Ice')).collect()
+[Row(age=2, name='Alice')]
+"""
 _startswith_doc = """
 String starts with. Returns a boolean :class:`Column` based on a string 
match.
 
@@ -541,6 +561,7 @@ class Column(object):
 contains = _bin_op("contains", _contains_doc)
 rlike = _bin_op("rlike", _rlike_doc)
 like = _bin_op("like", _like_doc)
+ilike = _bin_op("ilike", _ilike_doc)
 startswith = _bin_op("startsWith", _startswith_doc)
 endswith = _bin_op("endsWith", _endswith_doc)
 
diff --git a/python/pyspark/sql/tests/test_column.py 
b/python/pyspark/sql/tests/test_column.py
index c2530b2..9a918c2 100644
--- a/python/pyspark/sql/tests/test_column.py
+++ b/python/pyspark/sql/tests/test_column.py
@@ -75,7 +75,7 @@ class ColumnTests(ReusedSQLTestCase):
 self.assertTrue(all(isinstance(c, Column) for c in cb))
 cbool = (ci & ci), (ci | ci), (~ci)
 self.assertTrue(all(isinstance(c, Column) for c in cbool))
-css = cs.contains('a'), cs.like('a'), cs.rlike('a'), cs.asc(), 
cs.desc(),\
+css = cs.contains('a'), cs.like('a'), cs.rlike('a'), cs.ilike('A'), 
cs.asc(), cs.desc(),\
 cs.startswith('a'), cs.endswith('a'), ci.eqNullSafe(cs)
 self.assertTrue(all(isinstance(c, Column) for c in css))
 self.assertTrue(isinstance(ci.cast(LongType()), Column))

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6e815da -> 0b65daa)

2021-09-23 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6e815da  [SPARK-36760][SQL][FOLLOWUP] Add interface 
SupportsPushDownV2Filters
 add 0b65daa  [SPARK-36760][DOCS][FOLLOWUP] Fix wrong JavaDoc style

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connector/read/SupportsPushDownV2Filters.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4cc39cf -> 30d17b6)

2021-09-20 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4cc39cf  [SPARK-36101][CORE] Grouping exception in core/api
 add 30d17b6  [SPARK-36683][SQL] Add new built-in SQL functions: SEC and CSC

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst   |  2 +
 python/pyspark/sql/functions.py| 58 
 python/pyspark/sql/functions.pyi   |  2 +
 python/pyspark/sql/tests/test_functions.py | 78 --
 python/pyspark/testing/sqlutils.py |  8 +++
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  2 +
 .../sql/catalyst/expressions/mathExpressions.scala | 46 +
 .../expressions/MathExpressionsSuite.scala | 28 
 .../scala/org/apache/spark/sql/functions.scala | 18 +
 .../sql-functions/sql-expression-schema.md |  4 +-
 .../test/resources/sql-tests/inputs/operators.sql  |  8 +++
 .../resources/sql-tests/results/operators.sql.out  | 66 +-
 .../org/apache/spark/sql/MathFunctionsSuite.scala  | 15 +
 13 files changed, 299 insertions(+), 36 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36751][SQL][PYTHON][R] Add bit/octet_length APIs to Scala, Python and R

2021-09-15 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0666f5c  [SPARK-36751][SQL][PYTHON][R] Add bit/octet_length APIs to 
Scala, Python and R
0666f5c is described below

commit 0666f5c00393acccecdd82d3794e5a2b88f3210b
Author: Leona Yoda 
AuthorDate: Wed Sep 15 16:27:13 2021 +0900

[SPARK-36751][SQL][PYTHON][R] Add bit/octet_length APIs to Scala, Python 
and R

### What changes were proposed in this pull request?

octet_length: caliculate the byte length of strings
bit_length: caliculate the bit length of strings
Those two string related functions are only implemented on SparkSQL, not on 
Scala, Python and R.

### Why are the changes needed?

Those functions would be useful for multi-bytes character users, who mainly 
working with Scala, Python or R.

### Does this PR introduce _any_ user-facing change?

Yes. Users can call octet_length/bit_length APIs on Scala(Dataframe), 
Python, and R.

### How was this patch tested?

unit tests

Closes #33992 from yoda-mon/add-bit-octet-length.

Authored-by: Leona Yoda 
Signed-off-by: Kousuke Saruta 
---
 R/pkg/NAMESPACE|  2 +
 R/pkg/R/functions.R| 26 +++
 R/pkg/R/generics.R |  8 
 R/pkg/tests/fulltests/test_sparkSQL.R  | 11 +
 python/docs/source/reference/pyspark.sql.rst   |  2 +
 python/pyspark/sql/functions.py| 52 ++
 python/pyspark/sql/functions.pyi   |  2 +
 python/pyspark/sql/tests/test_functions.py | 14 +-
 .../scala/org/apache/spark/sql/functions.scala | 16 +++
 .../apache/spark/sql/StringFunctionsSuite.scala| 52 ++
 10 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 7fa8085..686a49e 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -243,6 +243,7 @@ exportMethods("%<=>%",
   "base64",
   "between",
   "bin",
+  "bit_length",
   "bitwise_not",
   "bitwiseNOT",
   "bround",
@@ -364,6 +365,7 @@ exportMethods("%<=>%",
   "not",
   "nth_value",
   "ntile",
+  "octet_length",
   "otherwise",
   "over",
   "overlay",
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 62066da1..f0768c7 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -647,6 +647,19 @@ setMethod("bin",
   })
 
 #' @details
+#' \code{bit_length}: Calculates the bit length for the specified string 
column.
+#'
+#' @rdname column_string_functions
+#' @aliases bit_length bit_length,Column-method
+#' @note length since 3.3.0
+setMethod("bit_length",
+  signature(x = "Column"),
+  function(x) {
+jc <- callJStatic("org.apache.spark.sql.functions", "bit_length", 
x@jc)
+column(jc)
+  })
+
+#' @details
 #' \code{bitwise_not}: Computes bitwise NOT.
 #'
 #' @rdname column_nonaggregate_functions
@@ -1570,6 +1583,19 @@ setMethod("negate",
   })
 
 #' @details
+#' \code{octet_length}: Calculates the byte length for the specified string 
column.
+#'
+#' @rdname column_string_functions
+#' @aliases octet_length octet_length,Column-method
+#' @note length since 3.3.0
+setMethod("octet_length",
+  signature(x = "Column"),
+  function(x) {
+jc <- callJStatic("org.apache.spark.sql.functions", 
"octet_length", x@jc)
+column(jc)
+  })
+
+#' @details
 #' \code{overlay}: Overlay the specified portion of \code{x} with 
\code{replace},
 #' starting from byte position \code{pos} of \code{src} and proceeding for
 #' \code{len} bytes.
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index 9ebea3f..1abde65 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -884,6 +884,10 @@ setGeneric("base64", function(x) { 
standardGeneric("base64") })
 #' @name NULL
 setGeneric("bin", function(x) { standardGeneric("bin") })
 
+#' @rdname column_string_functions
+#' @name NULL
+setGeneric("bit_length", function(x, ...) { standardGeneric("bit_length") })
+
 #' @rdname column_nonaggregate_functions
 #' @name NULL
 setGeneric("bitwise_not", function(x) { standardGeneric("bitwise_not") })
@@ -1232,6 +1

[spark] branch branch-3.1 updated: [SPARK-36639][SQL] Fix an issue that sequence builtin function causes ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && ste

2021-09-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new c1f8d75  [SPARK-36639][SQL] Fix an issue that sequence builtin 
function causes ArrayIndexOutOfBoundsException if the arguments are under the 
condition of start == stop && step < 0
c1f8d75 is described below

commit c1f8d759a3d75885e694e8c468ee6beea70131a3
Author: Kousuke Saruta 
AuthorDate: Fri Sep 3 23:25:18 2021 +0900

[SPARK-36639][SQL] Fix an issue that sequence builtin function causes 
ArrayIndexOutOfBoundsException if the arguments are under the condition of 
start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes 
`ArrayIndexOutOfBoundsException` if the arguments are under the condition of 
`start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 
month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT 
sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.
    
Closes #33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit cf3bc65e69dcb0f8ba3dee89642d082265edab31)
Signed-off-by: Kousuke Saruta 
---
 .../catalyst/expressions/collectionOperations.scala|  4 ++--
 .../expressions/CollectionExpressionsSuite.scala   | 18 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index b341895..bb2163c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -2711,7 +2711,7 @@ object Sequence {
 val maxEstimatedArrayLength =
   getSequenceLength(startMicros, stopMicros, intervalStepInMicros)
 
-val stepSign = if (stopMicros >= startMicros) +1 else -1
+val stepSign = if (intervalStepInMicros > 0) +1 else -1
 val exclusiveItem = stopMicros + stepSign
 val arr = new Array[T](maxEstimatedArrayLength)
 var t = startMicros
@@ -2786,7 +2786,7 @@ object Sequence {
  |
  |  $sequenceLengthCode
  |
- |  final int $stepSign = $stopMicros >= $startMicros ? +1 : -1;
+ |  final int $stepSign = $intervalInMicros > 0 ? +1 : -1;
  |  final long $exclusiveItem = $stopMicros + $stepSign;
  |
  |  $arr = new $elemType[$arrLength];
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
index 095894b..d79f06f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
@@ -1888,6 +1888,24 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   Seq(Date.valueOf("2018-01-01")))
   }
 
+  test("SPARK-36639: Start and end equal in month range with a negative step") 
{
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 day"))),
+  Seq(Date.valueOf("2018-01-01")))
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 month"))),
+  Seq(Date.valueOf("2018-01-01")))
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 year"))),
+  Seq(Date.valueOf("2018-01-01")))
+  }
+
   test("SPARK-33386: element_at ArrayIndexOutOfBoundsException") {
 Seq(true, false).foreach { ansiEnabled =>
   withSQLConf(SQLConf.ANSI_ENABLED.key -&g

[spark] branch branch-3.2 updated: [SPARK-36639][SQL] Fix an issue that sequence builtin function causes ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && ste

2021-09-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new a3901ed  [SPARK-36639][SQL] Fix an issue that sequence builtin 
function causes ArrayIndexOutOfBoundsException if the arguments are under the 
condition of start == stop && step < 0
a3901ed is described below

commit a3901ed3848d21fd36bb5aa265ef8e8d74d8e324
Author: Kousuke Saruta 
AuthorDate: Fri Sep 3 23:25:18 2021 +0900

[SPARK-36639][SQL] Fix an issue that sequence builtin function causes 
ArrayIndexOutOfBoundsException if the arguments are under the condition of 
start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes 
`ArrayIndexOutOfBoundsException` if the arguments are under the condition of 
`start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 
month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT 
sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.
    
Closes #33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit cf3bc65e69dcb0f8ba3dee89642d082265edab31)
Signed-off-by: Kousuke Saruta 
---
 .../catalyst/expressions/collectionOperations.scala|  4 ++--
 .../expressions/CollectionExpressionsSuite.scala   | 18 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 6cbab86..ce17231 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -2903,7 +2903,7 @@ object Sequence {
 val maxEstimatedArrayLength =
   getSequenceLength(startMicros, stopMicros, input3, 
intervalStepInMicros)
 
-val stepSign = if (stopMicros >= startMicros) +1 else -1
+val stepSign = if (intervalStepInMicros > 0) +1 else -1
 val exclusiveItem = stopMicros + stepSign
 val arr = new Array[T](maxEstimatedArrayLength)
 var t = startMicros
@@ -2989,7 +2989,7 @@ object Sequence {
  |
  |  $sequenceLengthCode
  |
- |  final int $stepSign = $stopMicros >= $startMicros ? +1 : -1;
+ |  final int $stepSign = $intervalInMicros > 0 ? +1 : -1;
  |  final long $exclusiveItem = $stopMicros + $stepSign;
  |
  |  $arr = new $elemType[$arrLength];
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
index caa5e96..e8f5f07 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
@@ -2232,6 +2232,24 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   Seq(Date.valueOf("2018-01-01")))
   }
 
+  test("SPARK-36639: Start and end equal in month range with a negative step") 
{
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 day"))),
+  Seq(Date.valueOf("2018-01-01")))
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 month"))),
+  Seq(Date.valueOf("2018-01-01")))
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 year"))),
+  Seq(Date.valueOf("2018-01-01")))
+  }
+
   test("SPARK-33386: element_at ArrayIndexOutOfBoundsException") {
 Seq(true, false).foreach { ansiEnabled =>
   withSQLConf(SQLConf.ANSI_ENAB

[spark] branch master updated: [SPARK-36639][SQL] Fix an issue that sequence builtin function causes ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step <

2021-09-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cf3bc65  [SPARK-36639][SQL] Fix an issue that sequence builtin 
function causes ArrayIndexOutOfBoundsException if the arguments are under the 
condition of start == stop && step < 0
cf3bc65 is described below

commit cf3bc65e69dcb0f8ba3dee89642d082265edab31
Author: Kousuke Saruta 
AuthorDate: Fri Sep 3 23:25:18 2021 +0900

[SPARK-36639][SQL] Fix an issue that sequence builtin function causes 
ArrayIndexOutOfBoundsException if the arguments are under the condition of 
start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes 
`ArrayIndexOutOfBoundsException` if the arguments are under the condition of 
`start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 
month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT 
sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.
    
Closes #33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../catalyst/expressions/collectionOperations.scala|  4 ++--
 .../expressions/CollectionExpressionsSuite.scala   | 18 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 6cbab86..ce17231 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -2903,7 +2903,7 @@ object Sequence {
 val maxEstimatedArrayLength =
   getSequenceLength(startMicros, stopMicros, input3, 
intervalStepInMicros)
 
-val stepSign = if (stopMicros >= startMicros) +1 else -1
+val stepSign = if (intervalStepInMicros > 0) +1 else -1
 val exclusiveItem = stopMicros + stepSign
 val arr = new Array[T](maxEstimatedArrayLength)
 var t = startMicros
@@ -2989,7 +2989,7 @@ object Sequence {
  |
  |  $sequenceLengthCode
  |
- |  final int $stepSign = $stopMicros >= $startMicros ? +1 : -1;
+ |  final int $stepSign = $intervalInMicros > 0 ? +1 : -1;
  |  final long $exclusiveItem = $stopMicros + $stepSign;
  |
  |  $arr = new $elemType[$arrLength];
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
index 8f35cf3..688ee61 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
@@ -2249,6 +2249,24 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   Seq(Date.valueOf("2018-01-01")))
   }
 
+  test("SPARK-36639: Start and end equal in month range with a negative step") 
{
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 day"))),
+  Seq(Date.valueOf("2018-01-01")))
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 month"))),
+  Seq(Date.valueOf("2018-01-01")))
+checkEvaluation(new Sequence(
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(Date.valueOf("2018-01-01")),
+  Literal(stringToInterval("interval -1 year"))),
+  Seq(Date.valueOf("2018-01-01")))
+  }
+
   test("SPARK-33386: element_at ArrayIndexOutOfBoundsException") {
 Seq(true, false).foreach { ansiEnabled =>
   withSQLConf(SQLConf.ANSI_ENABLED.key -> ansiEnabled.toString) {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9c5bcac -> 94c3062)

2021-09-02 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9c5bcac  [SPARK-36626][PYTHON] Support TimestampNTZ in 
createDataFrame/toPandas and Python UDFs
 add 94c3062  [SPARK-36400][TEST][FOLLOWUP] Add test for redacting 
sensitive information in UI by config

No new revisions were added by this update.

Summary of changes:
 .../sql/hive/thriftserver/UISeleniumSuite.scala| 45 ++
 1 file changed, 45 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (c420149 -> 068465d)

2021-08-28 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c420149  [SPARK-36352][SQL][3.0] Spark should check result plan's 
output schema name
 add 068465d  [SPARK-36509][CORE] Fix the issue that executors are never 
re-scheduled if the worker stops with standalone cluster

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 +
 core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster

2021-08-28 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 0af666a  [SPARK-36509][CORE] Fix the issue that executors are never 
re-scheduled if the worker stops with standalone cluster
0af666a is described below

commit 0af666a310590367a80439000d74975526064c87
Author: Kousuke Saruta 
AuthorDate: Sat Aug 28 18:01:55 2021 +0900

[SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if 
the worker stops with standalone cluster

### What changes were proposed in this pull request?

This PR fixes an issue that executors are never re-scheduled if the worker 
which the executors run on stops.
As a result, the application stucks.
You can easily reproduce this issue by the following procedures.

```
# Run master
$ sbin/start-master.sh

# Run worker 1
$ SPARK_LOG_DIR=/tmp/worker1 SPARK_PID_DIR=/tmp/worker1/ 
sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker1 --webui-port 8081 
spark://:7077

# Run worker 2
$ SPARK_LOG_DIR=/tmp/worker2 SPARK_PID_DIR=/tmp/worker2/ 
sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker2 --webui-port 8082 
spark://:7077

# Run Spark Shell
$ bin/spark-shell --master spark://:7077 --executor-cores 1 
--total-executor-cores 1

# Check which worker the executor runs on and then kill the worker.
$ kill 
```

With the procedure above, we will expect that the executor is re-scheduled 
on the other worker but it won't.

The reason seems that `Master.schedule` cannot be called after the worker 
is marked as `WorkerState.DEAD`.
So, the solution this PR proposes is to call `Master.schedule` whenever 
`Master.removeWorker` is called.

This PR also fixes an issue that `ExecutorRunner` can send 
`ExecutorStateChanged` message without changing its state.
This issue causes assertion error.
```
2021-08-13 14:05:37,991 [dispatcher-event-loop-9] ERROR: Ignoring 
errorjava.lang.AssertionError: assertion failed: executor 0 state transfer from 
RUNNING to RUNNING is illegal
```

### Why are the changes needed?

It's a critical bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested with the procedure shown above and confirmed the executor 
is re-scheduled.

Closes #33818 from sarutak/fix-scheduling-stuck.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit ea8c31e5ea233da4407f6821b2d6dd7f3c88f8d9)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 +
 core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index 9f1b36a..1cbeacf 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -964,6 +964,7 @@ private[deploy] class Master(
   app.driver.send(WorkerRemoved(worker.id, worker.host, msg))
 }
 persistenceEngine.removeWorker(worker)
+schedule()
   }
 
   private def relaunchDriver(driver: DriverInfo): Unit = {
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
index 974c2d6..40d9407 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
@@ -83,7 +83,7 @@ private[deploy] class ExecutorRunner(
 shutdownHook = ShutdownHookManager.addShutdownHook { () =>
   // It's possible that we arrive here before calling 
`fetchAndRunExecutor`, then `state` will
   // be `ExecutorState.LAUNCHING`. In this case, we should set `state` to 
`FAILED`.
-  if (state == ExecutorState.LAUNCHING) {
+  if (state == ExecutorState.LAUNCHING || state == ExecutorState.RUNNING) {
 state = ExecutorState.FAILED
   }
   killProcess(Some("Worker shutting down")) }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster

2021-08-28 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 93f2b00  [SPARK-36509][CORE] Fix the issue that executors are never 
re-scheduled if the worker stops with standalone cluster
93f2b00 is described below

commit 93f2b00501c7fad20fb6bc130b548cb87e9f91f1
Author: Kousuke Saruta 
AuthorDate: Sat Aug 28 18:01:55 2021 +0900

[SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if 
the worker stops with standalone cluster

### What changes were proposed in this pull request?

This PR fixes an issue that executors are never re-scheduled if the worker 
which the executors run on stops.
As a result, the application stucks.
You can easily reproduce this issue by the following procedures.

```
# Run master
$ sbin/start-master.sh

# Run worker 1
$ SPARK_LOG_DIR=/tmp/worker1 SPARK_PID_DIR=/tmp/worker1/ 
sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker1 --webui-port 8081 
spark://:7077

# Run worker 2
$ SPARK_LOG_DIR=/tmp/worker2 SPARK_PID_DIR=/tmp/worker2/ 
sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker2 --webui-port 8082 
spark://:7077

# Run Spark Shell
$ bin/spark-shell --master spark://:7077 --executor-cores 1 
--total-executor-cores 1

# Check which worker the executor runs on and then kill the worker.
$ kill 
```

With the procedure above, we will expect that the executor is re-scheduled 
on the other worker but it won't.

The reason seems that `Master.schedule` cannot be called after the worker 
is marked as `WorkerState.DEAD`.
So, the solution this PR proposes is to call `Master.schedule` whenever 
`Master.removeWorker` is called.

This PR also fixes an issue that `ExecutorRunner` can send 
`ExecutorStateChanged` message without changing its state.
This issue causes assertion error.
```
2021-08-13 14:05:37,991 [dispatcher-event-loop-9] ERROR: Ignoring 
errorjava.lang.AssertionError: assertion failed: executor 0 state transfer from 
RUNNING to RUNNING is illegal
```

### Why are the changes needed?

It's a critical bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested with the procedure shown above and confirmed the executor 
is re-scheduled.

Closes #33818 from sarutak/fix-scheduling-stuck.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit ea8c31e5ea233da4407f6821b2d6dd7f3c88f8d9)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 +
 core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index c964e34..7dbf6b9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -965,6 +965,7 @@ private[deploy] class Master(
   app.driver.send(WorkerRemoved(worker.id, worker.host, msg))
 }
 persistenceEngine.removeWorker(worker)
+schedule()
   }
 
   private def relaunchDriver(driver: DriverInfo): Unit = {
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
index 974c2d6..40d9407 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
@@ -83,7 +83,7 @@ private[deploy] class ExecutorRunner(
 shutdownHook = ShutdownHookManager.addShutdownHook { () =>
   // It's possible that we arrive here before calling 
`fetchAndRunExecutor`, then `state` will
   // be `ExecutorState.LAUNCHING`. In this case, we should set `state` to 
`FAILED`.
-  if (state == ExecutorState.LAUNCHING) {
+  if (state == ExecutorState.LAUNCHING || state == ExecutorState.RUNNING) {
 state = ExecutorState.FAILED
   }
   killProcess(Some("Worker shutting down")) }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fe7bf5f -> ea8c31e)

2021-08-28 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fe7bf5f  [SPARK-36327][SQL] Spark sql creates staging dir inside 
database directory rather than creating inside table directory
 add ea8c31e  [SPARK-36509][CORE] Fix the issue that executors are never 
re-scheduled if the worker stops with standalone cluster

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 +
 core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

2021-08-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 8c0852c  [SPARK-36398][SQL] Redact sensitive information in Spark 
Thrift Server log
8c0852c is described below

commit 8c0852ca805a918cebe9f22166887128a03b3222
Author: Kousuke Saruta 
AuthorDate: Wed Aug 25 21:30:43 2021 +0900

[SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

### What changes were proposed in this pull request?

This PR fixes an issue that there is no way to redact sensitive information 
in Spark Thrift Server log.
For example, JDBC password can be exposed in the log.
```
21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613
```

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed the log.
```
21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password=*(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909
```

Closes #33832 from sarutak/fix-SPARK-36398.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit b2ff01608f5ecdba19630e12478bd370f9766f7b)
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index acb00e4..bb55bb0 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -219,8 +219,8 @@ private[hive] class SparkExecuteStatementOperation(
 
   override def runInternal(): Unit = {
 setState(OperationState.PENDING)
-logInfo(s"Submitting query '$statement' with $statementId")
 val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
+logInfo(s"Submitting query '$redactedStatement' with $statementId")
 HiveThriftServer2.eventManager.onStatementStart(
   statementId,
   parentSession.getSessionHandle.getSessionId.toString,

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

2021-08-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new fb38887  [SPARK-36398][SQL] Redact sensitive information in Spark 
Thrift Server log
fb38887 is described below

commit fb38887e001d33adef519d0288bd0844dcfe2bd5
Author: Kousuke Saruta 
AuthorDate: Wed Aug 25 21:30:43 2021 +0900

[SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

### What changes were proposed in this pull request?

This PR fixes an issue that there is no way to redact sensitive information 
in Spark Thrift Server log.
For example, JDBC password can be exposed in the log.
```
21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613
```

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed the log.
```
21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password=*(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909
```

Closes #33832 from sarutak/fix-SPARK-36398.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit b2ff01608f5ecdba19630e12478bd370f9766f7b)
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 0df5885..4f40889 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -185,8 +185,8 @@ private[hive] class SparkExecuteStatementOperation(
 
   override def runInternal(): Unit = {
 setState(OperationState.PENDING)
-logInfo(s"Submitting query '$statement' with $statementId")
 val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
+logInfo(s"Submitting query '$redactedStatement' with $statementId")
 HiveThriftServer2.eventManager.onStatementStart(
   statementId,
   parentSession.getSessionHandle.getSessionId.toString,

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

2021-08-25 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b2ff016  [SPARK-36398][SQL] Redact sensitive information in Spark 
Thrift Server log
b2ff016 is described below

commit b2ff01608f5ecdba19630e12478bd370f9766f7b
Author: Kousuke Saruta 
AuthorDate: Wed Aug 25 21:30:43 2021 +0900

[SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

### What changes were proposed in this pull request?

This PR fixes an issue that there is no way to redact sensitive information 
in Spark Thrift Server log.
For example, JDBC password can be exposed in the log.
```
21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613
```

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed the log.
```
21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password=*(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909
```

Closes #33832 from sarutak/fix-SPARK-36398.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 0df5885..4f40889 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -185,8 +185,8 @@ private[hive] class SparkExecuteStatementOperation(
 
   override def runInternal(): Unit = {
 setState(OperationState.PENDING)
-logInfo(s"Submitting query '$statement' with $statementId")
 val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
+logInfo(s"Submitting query '$redactedStatement' with $statementId")
 HiveThriftServer2.eventManager.onStatementStart(
   statementId,
   parentSession.getSessionHandle.getSessionId.toString,

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0b6af46 -> adc485a)

2021-08-23 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0b6af46  [SPARK-36470][PYTHON] Implement `CategoricalIndex.map` and 
`DatetimeIndex.map`
 add adc485a  [MINOR][DOCS] Mention Hadoop 3 in YARN introduction on 
cluster-overview.md

No new revisions were added by this update.

Summary of changes:
 docs/cluster-overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2

2021-08-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 281b00a  [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2
281b00a is described below

commit 281b00ab5b3dd3f21dd6af020ad5455f35498b79
Author: Kousuke Saruta 
AuthorDate: Wed Aug 18 13:40:52 2021 +0900

[SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2

### What changes were proposed in this pull request?

This PR upgrades Caffeine to `2.9.2`.
Caffeine was introduced in SPARK-34309 (#31517). At the time that PR was 
opened, the latest version of caffeine was `2.9.1` but now `2.9.2` is available.

### Why are the changes needed?

`2.9.2` have the following improvements 
(https://github.com/ben-manes/caffeine/releases/tag/v2.9.2).

* Fixed reading an intermittent null weak/soft value during a concurrent 
write
* Fixed extraneous eviction when concurrently removing a collected entry 
after a writer resurrects it with a new mapping
* Fixed excessive retries of discarding an expired entry when the fixed 
duration period is extended, thereby resurrecting it

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CIs.

Closes #33772 from sarutak/upgrade-caffeine-2.9.2.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 pom.xml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 
b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
index 1dc01b5..31dd02f 100644
--- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
@@ -30,7 +30,7 @@ blas/2.2.0//blas-2.2.0.jar
 bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar
 breeze_2.12/1.2//breeze_2.12-1.2.jar
-caffeine/2.9.1//caffeine-2.9.1.jar
+caffeine/2.9.2//caffeine-2.9.2.jar
 cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar
 checker-qual/3.10.0//checker-qual-3.10.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 
b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index 698a03c..5b27680 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -25,7 +25,7 @@ blas/2.2.0//blas-2.2.0.jar
 bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar
 breeze_2.12/1.2//breeze_2.12-1.2.jar
-caffeine/2.9.1//caffeine-2.9.1.jar
+caffeine/2.9.2//caffeine-2.9.2.jar
 cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar
 checker-qual/3.10.0//checker-qual-3.10.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
diff --git a/pom.xml b/pom.xml
index bd1722f..1452b0b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -182,7 +182,7 @@
 2.6.2
 4.1.17
 14.0.1
-2.9.1
+2.9.2
 3.0.16
 2.34
 2.10.10

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex

2021-08-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 31d771d  [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer 
recognize spark.sql.redaction.string.regex
31d771d is described below

commit 31d771dcf242cfa477b04f28950526bf87b7e90a
Author: Kousuke Saruta 
AuthorDate: Wed Aug 18 13:31:22 2021 +0900

[SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize 
spark.sql.redaction.string.regex

### What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize 
`spark.sql.redaction.string.regex`.
The problem is that sensitive information included in queries can be 
exposed.

![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)

![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed UI.


![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)

![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)

Closes #33743 from sarutak/thrift-redact.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2)
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index f7a4be9..acb00e4 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -220,10 +220,11 @@ private[hive] class SparkExecuteStatementOperation(
   override def runInternal(): Unit = {
 setState(OperationState.PENDING)
 logInfo(s"Submitting query '$statement' with $statementId")
+val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
 HiveThriftServer2.eventManager.onStatementStart(
   statementId,
   parentSession.getSessionHandle.getSessionId.toString,
-  statement,
+  redactedStatement,
   statementId,
   parentSession.getUsername)
 setHasResultSet(true) // avoid no resultset for async run

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex

2021-08-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new b749b49  [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer 
recognize spark.sql.redaction.string.regex
b749b49 is described below

commit b749b49a283800d3e12455a00a23da24bf6cd333
Author: Kousuke Saruta 
AuthorDate: Wed Aug 18 13:31:22 2021 +0900

[SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize 
spark.sql.redaction.string.regex

### What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize 
`spark.sql.redaction.string.regex`.
The problem is that sensitive information included in queries can be 
exposed.

![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)

![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed UI.


![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)

![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)

Closes #33743 from sarutak/thrift-redact.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2)
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index f43f8e7..0df5885 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -186,10 +186,11 @@ private[hive] class SparkExecuteStatementOperation(
   override def runInternal(): Unit = {
 setState(OperationState.PENDING)
 logInfo(s"Submitting query '$statement' with $statementId")
+val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
 HiveThriftServer2.eventManager.onStatementStart(
   statementId,
   parentSession.getSessionHandle.getSessionId.toString,
-  statement,
+  redactedStatement,
   statementId,
   parentSession.getUsername)
 setHasResultSet(true) // avoid no resultset for async run

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex

2021-08-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b914ff7  [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer 
recognize spark.sql.redaction.string.regex
b914ff7 is described below

commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2
Author: Kousuke Saruta 
AuthorDate: Wed Aug 18 13:31:22 2021 +0900

[SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize 
spark.sql.redaction.string.regex

### What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize 
`spark.sql.redaction.string.regex`.
The problem is that sensitive information included in queries can be 
exposed.

![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)

![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed UI.


![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)

![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)

Closes #33743 from sarutak/thrift-redact.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index f43f8e7..0df5885 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -186,10 +186,11 @@ private[hive] class SparkExecuteStatementOperation(
   override def runInternal(): Unit = {
 setState(OperationState.PENDING)
 logInfo(s"Submitting query '$statement' with $statementId")
+val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
 HiveThriftServer2.eventManager.onStatementStart(
   statementId,
   parentSession.getSessionHandle.getSessionId.toString,
-  statement,
+  redactedStatement,
   statementId,
   parentSession.getUsername)
 setHasResultSet(true) // avoid no resultset for async run

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported"

2021-08-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 586eb5d  Revert "[SPARK-36429][SQL] JacksonParser should throw 
exception when data type unsupported"
586eb5d is described below

commit 586eb5d4c6b01b008cb0ace076f94f49580201de
Author: Kousuke Saruta 
AuthorDate: Fri Aug 6 20:56:24 2021 +0900

Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data 
type unsupported"

### What changes were proposed in this pull request?

This PR reverts the change in SPARK-36429 (#33654).
See 
[conversation](https://github.com/apache/spark/pull/33654#issuecomment-894160037).

### Why are the changes needed?

To recover CIs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

    Closes #33670 from sarutak/revert-SPARK-36429.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit e17612d0bfa1b1dc719f6f2c202e2a4ea7870ff1)
Signed-off-by: Kousuke Saruta 
---
 .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala  | 8 ++--
 .../sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 ++---
 .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 ++---
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
index 2761c52..04a0f1a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
@@ -330,8 +330,12 @@ class JacksonParser(
 case udt: UserDefinedType[_] =>
   makeConverter(udt.sqlType)
 
-// We don't actually hit this exception though, we keep it for 
understandability
-case _ => throw QueryExecutionErrors.unsupportedTypeError(dataType)
+case _ =>
+  (parser: JsonParser) =>
+// Here, we pass empty `PartialFunction` so that this case can be
+// handled as a failed conversion. It will throw an exception as
+// long as the value is not null.
+parseJsonToken[AnyRef](parser, 
dataType)(PartialFunction.empty[JsonToken, AnyRef])
   }
 
   /**
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
index fae7721..fe83675 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
@@ -661,10 +661,9 @@ You may get a different result due to the upgrading of 
Spark 3.0: Fail to recogn
 -- !query
 select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
 -- !query schema
-struct<>
+struct>
 -- !query output
-java.lang.Exception
-Unsupported type: timestamp_ntz
+{"t":null}
 
 
 -- !query
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
index c6de535..b8a6800 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
@@ -642,10 +642,9 @@ You may get a different result due to the upgrading of 
Spark 3.0: Fail to recogn
 -- !query
 select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
 -- !query schema
-struct<>
+struct>
 -- !query output
-java.lang.Exception
-Unsupported type: timestamp_ntz
+{"t":null}
 
 
 -- !query

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (888f8f0 -> e17612d)

2021-08-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 888f8f0  [SPARK-36339][SQL] References to grouping that not part of 
aggregation should be replaced
 add e17612d  Revert "[SPARK-36429][SQL] JacksonParser should throw 
exception when data type unsupported"

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala  | 8 ++--
 .../sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 ++---
 .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 ++---
 3 files changed, 10 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36318][SQL][DOCS] Update docs about mapping of ANSI interval types to Java/Scala/SQL types

2021-07-27 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 56f1ee4  [SPARK-36318][SQL][DOCS] Update docs about mapping of ANSI 
interval types to Java/Scala/SQL types
56f1ee4 is described below

commit 56f1ee4b066ca5123c207acd91f27ee80a3bb07b
Author: Max Gekk 
AuthorDate: Wed Jul 28 13:42:35 2021 +0900

[SPARK-36318][SQL][DOCS] Update docs about mapping of ANSI interval types 
to Java/Scala/SQL types

### What changes were proposed in this pull request?
1. Update the tables at 
https://spark.apache.org/docs/latest/sql-ref-datatypes.html about mapping ANSI 
interval types to Java/Scala/SQL types.
2. Remove `CalendarIntervalType` from the table of mapping Catalyst types 
to SQL types.

https://user-images.githubusercontent.com/1580697/127204790-7ccb9c64-daf2-427d-963e-b7367aaa3439.png;>
https://user-images.githubusercontent.com/1580697/127204806-a0a51950-3c2d-4198-8a22-0f6614bb1487.png;>

### Why are the changes needed?
To inform users which types from language APIs should be used as ANSI 
interval types.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually checking by building the docs:
```
$ SKIP_RDOC=1 SKIP_API=1 SKIP_PYTHONDOC=1 bundle exec jekyll build
```

Closes #33543 from MaxGekk/doc-interval-type-lang-api.

Authored-by: Max Gekk 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 1614d004174c1aeda0c1511d3cba92cf55fc14b0)
Signed-off-by: Kousuke Saruta 
---
 docs/sql-ref-datatypes.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md
index 89ffa34..d699bfe 100644
--- a/docs/sql-ref-datatypes.md
+++ b/docs/sql-ref-datatypes.md
@@ -125,6 +125,8 @@ You can access them by doing
 |**BooleanType**|Boolean|BooleanType|
 |**TimestampType**|java.sql.Timestamp|TimestampType|
 |**DateType**|java.sql.Date|DateType|
+|**YearMonthIntervalType**|java.time.Period|YearMonthIntervalType|
+|**DayTimeIntervalType**|java.time.Duration|DayTimeIntervalType|
 |**ArrayType**|scala.collection.Seq|ArrayType(*elementType*, 
[*containsNull]*)**Note:** The default value of *containsNull* is true.|
 |**MapType**|scala.collection.Map|MapType(*keyType*, *valueType*, 
[*valueContainsNull]*)**Note:** The default value of *valueContainsNull* 
is true.|
 |**StructType**|org.apache.spark.sql.Row|StructType(*fields*)**Note:** 
*fields* is a Seq of StructFields. Also, two fields with the same name are not 
allowed.|
@@ -153,6 +155,8 @@ please use factory methods provided in
 |**BooleanType**|boolean or Boolean|DataTypes.BooleanType|
 |**TimestampType**|java.sql.Timestamp|DataTypes.TimestampType|
 |**DateType**|java.sql.Date|DataTypes.DateType|
+|**YearMonthIntervalType**|java.time.Period|YearMonthIntervalType|
+|**DayTimeIntervalType**|java.time.Duration|DayTimeIntervalType|
 
|**ArrayType**|java.util.List|DataTypes.createArrayType(*elementType*)**Note:**
 The value of *containsNull* will be 
true.DataTypes.createArrayType(*elementType*, *containsNull*).|
 |**MapType**|java.util.Map|DataTypes.createMapType(*keyType*, 
*valueType*)**Note:** The value of *valueContainsNull* will be 
true.DataTypes.createMapType(*keyType*, *valueType*, *valueContainsNull*)|
 
|**StructType**|org.apache.spark.sql.Row|DataTypes.createStructType(*fields*)**Note:**
 *fields* is a List or an array of StructFields.Also, two fields with the same 
name are not allowed.|
@@ -230,7 +234,8 @@ The following table shows the type names as well as aliases 
used in Spark SQL pa
 |**StringType**|STRING|
 |**BinaryType**|BINARY|
 |**DecimalType**|DECIMAL, DEC, NUMERIC|
-|**CalendarIntervalType**|INTERVAL|
+|**YearMonthIntervalType**|INTERVAL YEAR, INTERVAL YEAR TO MONTH, INTERVAL 
MONTH|
+|**DayTimeIntervalType**|INTERVAL DAY, INTERVAL DAY TO HOUR, INTERVAL DAY TO 
MINUTE, INTERVAL DAY TO SECOND, INTERVAL HOUR, INTERVAL HOUR TO MINUTE, 
INTERVAL HOUR TO SECOND, INTERVAL MINUTE, INTERVAL MINUTE TO SECOND, INTERVAL 
SECOND|
 |**ArrayType**|ARRAY\|
 |**StructType**|STRUCT **Note:** ':' is optional.|
 |**MapType**|MAP|

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1fafa8e -> 1614d00)

2021-07-27 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1fafa8e  [SPARK-36314][SS] Update Sessionization examples to use 
native support of session window
 add 1614d00  [SPARK-36318][SQL][DOCS] Update docs about mapping of ANSI 
interval types to Java/Scala/SQL types

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-datatypes.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34619][SQL][DOCS] Describe ANSI interval types at the `Data types` page of the SQL reference

2021-07-27 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f483796  [SPARK-34619][SQL][DOCS] Describe ANSI interval types at the 
`Data types` page of the SQL reference
f483796 is described below

commit f4837961a9c4c35eaf71406c22874984b454e8fd
Author: Max Gekk 
AuthorDate: Tue Jul 27 19:05:39 2021 +0900

[SPARK-34619][SQL][DOCS] Describe ANSI interval types at the `Data types` 
page of the SQL reference

### What changes were proposed in this pull request?
In the PR, I propose to update the page 
https://spark.apache.org/docs/latest/sql-ref-datatypes.html and add information 
about the year-month and day-time interval types introduced by SPARK-27790.

https://user-images.githubusercontent.com/1580697/127115289-e633ca3a-2c18-49a0-a7c0-22421ae5c363.png;>

### Why are the changes needed?
To inform users about new ANSI interval types, and improve UX with Spark 
SQL.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Should be tested by a GitHub action.

Closes #33518 from MaxGekk/doc-interval-types.

Authored-by: Max Gekk 
Signed-off-by: Kousuke Saruta 
---
 docs/sql-ref-datatypes.md | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md
index ce22d92..89ffa34 100644
--- a/docs/sql-ref-datatypes.md
+++ b/docs/sql-ref-datatypes.md
@@ -49,6 +49,44 @@ Spark SQL and DataFrames support the following data types:
   absolute point in time.
   - `DateType`: Represents values comprising values of fields year, month and 
day, without a
   time-zone.
+* Interval types
+  - `YearMonthIntervalType(startField, endField)`: Represents a year-month 
interval which is made up of a contiguous subset of the following fields:
+- MONTH, months within years `[0..11]`,
+- YEAR, years in the range `[0..178956970]`.
+
+Individual interval fields are non-negative, but an interval itself can 
have a sign, and be negative.
+
+`startField` is the leftmost field, and `endField` is the rightmost field 
of the type. Valid values of `startField` and `endField` are 0(MONTH) and 
1(YEAR). Supported year-month interval types are:
+
+|Year-Month Interval Type|SQL type|An instance of the type|
+|-||---|
+|`YearMonthIntervalType(YEAR, YEAR)` or 
`YearMonthIntervalType(YEAR)`|INTERVAL YEAR|`INTERVAL '2021' YEAR`|
+|`YearMonthIntervalType(YEAR, MONTH)`|INTERVAL YEAR TO MONTH|`INTERVAL 
'2021-07' YEAR TO MONTH`|
+|`YearMonthIntervalType(MONTH, MONTH)` or 
`YearMonthIntervalType(MONTH)`|INTERVAL MONTH|`INTERVAL '10' MONTH`|
+
+  - `DayTimeIntervalType(startField, endField)`: Represents a day-time 
interval which is made up of a contiguous subset of the following fields:
+- SECOND, seconds within minutes and possibly fractions of a second 
`[0..59.99]`,
+- MINUTE, minutes within hours `[0..59]`,
+- HOUR, hours within days `[0..23]`,
+- DAY, days in the range `[0..106751991]`.
+
+Individual interval fields are non-negative, but an interval itself can 
have a sign, and be negative.
+
+`startField` is the leftmost field, and `endField` is the rightmost field 
of the type. Valid values of `startField` and `endField` are 0 (DAY), 1 (HOUR), 
2 (MINUTE), 3 (SECOND). Supported day-time interval types are:
+
+|Day-Time Interval Type|SQL type|An instance of the type|
+|-||---|
+|`DayTimeIntervalType(DAY, DAY)` or `DayTimeIntervalType(DAY)`|INTERVAL 
DAY|`INTERVAL '100' DAY`|
+|`DayTimeIntervalType(DAY, HOUR)`|INTERVAL DAY TO HOUR|`INTERVAL '100 10' 
DAY TO HOUR`|
+|`DayTimeIntervalType(DAY, MINUTE)`|INTERVAL DAY TO MINUTE|`INTERVAL '100 
10:30' DAY TO MINUTE`|
+|`DayTimeIntervalType(DAY, SECOND)`|INTERVAL DAY TO SECOND|`INTERVAL '100 
10:30:40.99' DAY TO SECOND`|
+|`DayTimeIntervalType(HOUR, HOUR)` or `DayTimeIntervalType(HOUR)`|INTERVAL 
HOUR|`INTERVAL '123' HOUR`|
+|`DayTimeIntervalType(HOUR, MINUTE)`|INTERVAL HOUR TO MINUTE|`INTERVAL 
'123:10' HOUR TO MINUTE`|
+|`DayTimeIntervalType(HOUR, SECOND)`|INTERVAL HOUR TO SECOND|`INTERVAL 
'123:10:59' HOUR TO SECOND`|
+|`DayTimeIntervalType(MINUTE, MINUTE)` or 
`DayTimeIntervalType(MINUTE)`|INTERVAL MINUTE|`INTERVAL '1000' MINUTE`|
+|`DayTimeIntervalType(MINUTE, SECOND)`|INTERVAL MINUTE TO SECOND|`INTERVAL 
'1000:01.001' MINUTE TO SECOND`|
+|`DayTimeIntervalType(SECOND, SECOND)` or 
`DayTimeIntervalType(SECOND)`|INTERVAL SECOND|`INTERVAL '1000.01' SECOND`|
+
 * Complex types
   - `ArrayType(elementType, containsNull)`: Represents values comprising a 
sequence of
   elements with the type of `elementType`. `containsNull` is used to indic

[spark] branch master updated (6474226 -> 554d5fe)

2021-07-05 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6474226  [SPARK-35982][SQL] Allow from_json/to_json for map types 
where value types are year-month intervals
 add 554d5fe  [SPARK-36010][BUILD] Upgrade sbt-antlr4 from 0.8.2 to 0.8.3

No new revisions were added by this update.

Summary of changes:
 project/plugins.sbt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (59ec7a2 -> c562c16)

2021-06-24 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 59ec7a2  [SPARK-35885][K8S][R] Use keyserver.ubuntu.com as a keyserver 
for CRAN
 add c562c16  [SPARK-34320][SQL][FOLLOWUP] Modify V2JDBCTest to follow the 
change of the error message

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35870][BUILD] Upgrade Jetty to 9.4.42

2021-06-24 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b78d56  [SPARK-35870][BUILD] Upgrade Jetty to 9.4.42
7b78d56 is described below

commit 7b78d56f34a30148374c68141a3adb3a0c432a1b
Author: Kousuke Saruta 
AuthorDate: Fri Jun 25 03:32:32 2021 +0900

[SPARK-35870][BUILD] Upgrade Jetty to 9.4.42

### What changes were proposed in this pull request?

This PR upgrades Jetty to `9.4.42`.
In the current master, `9.4.40` is used.
`9.4.41` and `9.4.42` include the following updates.
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.41.v20210516
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.42.v20210604

### Why are the changes needed?

Mainly for CVE-2021-28169.
https://nvd.nist.gov/vuln/detail/CVE-2021-28169
This CVE might little affect Spark, but just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #33053 from sarutak/upgrade-jetty-9.4.42.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index a53aae3..16fe9e6 100644
--- a/pom.xml
+++ b/pom.xml
@@ -138,7 +138,7 @@
 10.14.2.0
 1.12.0
 1.6.8
-9.4.40.v20210413
+9.4.42.v20210604
 4.0.3
 0.9.5
 2.4.0

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in KubernetesLocalDiskShuffleDataIOSuite

2021-06-19 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b9d6473  [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in 
KubernetesLocalDiskShuffleDataIOSuite
b9d6473 is described below

commit b9d6473e898cea255bbbc27f657e2958fd4c011b
Author: Dongjoon Hyun 
AuthorDate: Sat Jun 19 15:22:29 2021 +0900

[SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in 
KubernetesLocalDiskShuffleDataIOSuite

### What changes were proposed in this pull request?

This increases the timeout from 10 seconds to 60 seconds in 
KubernetesLocalDiskShuffleDataIOSuite to reduce the flakiness.

### Why are the changes needed?

- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140003/testReport/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs

Closes #32967 from dongjoon-hyun/SPARK-35593-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 .../apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
index e94e8dd..eca38a8 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
@@ -210,7 +210,7 @@ class KubernetesLocalDiskShuffleDataIOSuite extends 
SparkFunSuite with LocalSpar
 assert(master.shuffleStatuses(1).mapStatuses.forall(_ == null))
   }
   sc.parallelize(Seq((1, 1)), 2).groupByKey().collect()
-  eventually(timeout(10.second), interval(1.seconds)) {
+  eventually(timeout(60.second), interval(1.seconds)) {
 assert(master.shuffleStatuses(0).mapStatuses.map(_.mapId).toSet == 
Set(0, 1, 2))
 assert(master.shuffleStatuses(1).mapStatuses.map(_.mapId).toSet == 
Set(6, 7, 8))
   }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9709ee5 -> ac228d4)

2021-06-15 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9709ee5  [SPARK-35760][SQL] Fix the max rows check for broadcast 
exchange
 add ac228d4  [SPARK-35691][CORE] addFile/addJar/addDirectory should put 
CanonicalFile

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/rpc/RpcEnv.scala   |  3 +-
 .../spark/rpc/netty/NettyStreamManager.scala   | 12 
 .../main/scala/org/apache/spark/util/Utils.scala   |  2 +-
 .../scala/org/apache/spark/SparkContextSuite.scala | 32 ++
 .../scala/org/apache/spark/rpc/RpcEnvSuite.scala   |  9 ++
 5 files changed, 51 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-35746][UI] Fix taskid in the stage page task event timeline

2021-06-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6597c3b  [SPARK-35746][UI] Fix taskid in the stage page task event 
timeline
6597c3b is described below

commit 6597c3bd5e91040dc53576c912c85d84f630bb17
Author: shahid 
AuthorDate: Sat Jun 12 15:38:41 2021 +0900

[SPARK-35746][UI] Fix taskid in the stage page task event timeline

### What changes were proposed in this pull request?
Task id is given incorrect in the timeline plot in Stage Page

### Why are the changes needed?
Map event timeline plots to correct task
**Before:**

![image](https://user-images.githubusercontent.com/23054875/121761077-81775800-cb4b-11eb-8ec6-ee71926a6549.png)

**After**

![image](https://user-images.githubusercontent.com/23054875/121761195-02ceea80-cb4c-11eb-8ce6-07bb1cca190e.png)
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually tested

Closes #32888 from shahidki31/shahid/fixtaskid.

Authored-by: shahid 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 450b415028c3b00f3a002126cd11318d3932e28f)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index ccaa70b..e9eb62e 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -352,7 +352,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
|'content': '
+ |data-title="${s"Task " + taskInfo.taskId + " (attempt " + 
attempt + ")"}
  |Status: ${taskInfo.status}
  |Launch Time: ${UIUtils.formatDate(new Date(launchTime))}
  |${

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-35746][UI] Fix taskid in the stage page task event timeline

2021-06-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 78d3d0f  [SPARK-35746][UI] Fix taskid in the stage page task event 
timeline
78d3d0f is described below

commit 78d3d0f0a562743bb9a36854c2302b242f4d9309
Author: shahid 
AuthorDate: Sat Jun 12 15:38:41 2021 +0900

[SPARK-35746][UI] Fix taskid in the stage page task event timeline

### What changes were proposed in this pull request?
Task id is given incorrect in the timeline plot in Stage Page

### Why are the changes needed?
Map event timeline plots to correct task
**Before:**

![image](https://user-images.githubusercontent.com/23054875/121761077-81775800-cb4b-11eb-8ec6-ee71926a6549.png)

**After**

![image](https://user-images.githubusercontent.com/23054875/121761195-02ceea80-cb4c-11eb-8ce6-07bb1cca190e.png)
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually tested

Closes #32888 from shahidki31/shahid/fixtaskid.

Authored-by: shahid 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 450b415028c3b00f3a002126cd11318d3932e28f)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 47ba951..459e09a 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -355,7 +355,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
|'content': '
+ |data-title="${s"Task " + taskInfo.taskId + " (attempt " + 
attempt + ")"}
  |Status: ${taskInfo.status}
  |Launch Time: ${UIUtils.formatDate(new Date(launchTime))}
  |${

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35746][UI] Fix taskid in the stage page task event timeline

2021-06-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 450b415  [SPARK-35746][UI] Fix taskid in the stage page task event 
timeline
450b415 is described below

commit 450b415028c3b00f3a002126cd11318d3932e28f
Author: shahid 
AuthorDate: Sat Jun 12 15:38:41 2021 +0900

[SPARK-35746][UI] Fix taskid in the stage page task event timeline

### What changes were proposed in this pull request?
Task id is given incorrect in the timeline plot in Stage Page

### Why are the changes needed?
Map event timeline plots to correct task
**Before:**

![image](https://user-images.githubusercontent.com/23054875/121761077-81775800-cb4b-11eb-8ec6-ee71926a6549.png)

**After**

![image](https://user-images.githubusercontent.com/23054875/121761195-02ceea80-cb4c-11eb-8ce6-07bb1cca190e.png)
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually tested

Closes #32888 from shahidki31/shahid/fixtaskid.

Authored-by: shahid 
Signed-off-by: Kousuke Saruta 
---
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 777a6b0..81dfe83 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -355,7 +355,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
|'content': '
+ |data-title="${s"Task " + taskInfo.taskId + " (attempt " + 
attempt + ")"}
  |Status: ${taskInfo.status}
  |Launch Time: ${UIUtils.formatDate(new Date(launchTime))}
  |${

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions

2021-06-03 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c532f82  [SPARK-35609][BUILD] Add style rules to prohibit to use a 
Guava's API which is incompatible with newer versions
c532f82 is described below

commit c532f8260ee2f2f4170dc50f7e890fafab438b76
Author: Kousuke Saruta 
AuthorDate: Thu Jun 3 21:52:41 2021 +0900

[SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which 
is incompatible with newer versions

### What changes were proposed in this pull request?

This PR adds rules to `checkstyle.xml` and `scalastyle-config.xml` to avoid 
introducing `Objects.toStringHelper` a Guava's API which is no longer present 
in newer Guava.

### Why are the changes needed?

SPARK-30272 (#26911) replaced `Objects.toStringHelper` which is an APIs 
Guava 14 provides with `commons.lang3` API because `Objects.toStringHelper` is 
no longer present in newer Guava.
But toStringHelper was introduced into Spark again and replaced them in 
SPARK-35420 (#32567).
I think it's better to have a style rule to avoid such repetition.

SPARK-30272 replaced some APIs aside from `Objects.toStringHelper` but 
`Objects.toStringHelper` seems to affect Spark for now so I add rules only for 
it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that `lint-java` and `lint-scala` detect the usage of 
`toStringHelper` and let the lint check fail.
```
$ dev/lint-java
exec: curl --silent --show-error -L 
https://downloads.lightbend.com/scala/2.12.14/scala-2.12.14.tgz
Using `mvn` from path: /opt/maven/3.6.3//bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/network/protocol/OneWayMessage.java:[78] 
(regexp) RegexpSinglelineJava: Avoid using Object.toStringHelper. Use 
ToStringBuilder instead.

$ dev/lint-scala
Scalastyle checks failed at following occurrences:
[error] 
/home/kou/work/oss/spark/core/src/main/scala/org/apache/spark/rdd/RDD.scala:93:25:
 Avoid using Object.toStringHelper. Use ToStringBuilder instead.
[error] Total time: 25 s, completed 2021/06/02 16:18:25
```

Closes #32740 from sarutak/style-rule-for-guava.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 dev/checkstyle.xml| 5 -
 scalastyle-config.xml | 4 
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/dev/checkstyle.xml b/dev/checkstyle.xml
index 483fc7c..06c79a9 100644
--- a/dev/checkstyle.xml
+++ b/dev/checkstyle.xml
@@ -185,6 +185,9 @@
 
 
 
-
+
+
+
+
 
 
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index c1dc57b..c06b4ab 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -397,4 +397,8 @@ This file is divided into 3 sections:
 -1,0,1,2,3
   
 
+  
+Objects.toStringHelper
+Avoid using Object.toStringHelper. Use ToStringBuilder 
instead.
+  
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d773373 -> b7dd4b3)

2021-06-01 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d773373  [SPARK-35584][CORE][TESTS] Increase the timeout in 
FallbackStorageSuite
 add b7dd4b3  [SPARK-35516][WEBUI] Storage UI tab Storage Level tool tip 
correction

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ui/storage/ToolTips.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA

2021-05-28 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b763db3  [SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 
2.13 on GA
b763db3 is described below

commit b763db3efdd6a58e34c136b03426371400afefd1
Author: Kousuke Saruta 
AuthorDate: Sat May 29 00:11:16 2021 +0900

[SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA

### What changes were proposed in this pull request?

This PR fixes a build error with Scala 2.13 on GA.
#32301 seems to bring this error.

### Why are the changes needed?

To recover CI.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA

Closes #32696 from sarutak/followup-SPARK-35194.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
index cd7032d..e0e8f92 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
@@ -146,7 +146,8 @@ object NestedColumnAliasing {
 val nestedFieldToAlias = 
attributeToExtractValuesAndAliases.values.flatten.toMap
 
 // A reference attribute can have multiple aliases for nested fields.
-val attrToAliases = 
AttributeMap(attributeToExtractValuesAndAliases.mapValues(_.map(_._2)))
+val attrToAliases =
+  
AttributeMap(attributeToExtractValuesAndAliases.mapValues(_.map(_._2)).toSeq)
 
 plan match {
   case Project(projectList, child) =>

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0549caf -> 003294c)

2021-05-22 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0549caf  [MINOR][SQL] Change the script name for creating oracle 
docker image
 add 003294c  [SPARK-35488][BUILD] Upgrade ASM to 7.3.1

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 pom.xml | 6 +++---
 project/plugins.sbt | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (46f7d78 -> 9283beb)

2021-05-19 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 46f7d78  [SPARK-35368][SQL] Update histogram statistics for RANGE 
operator for stats estimation
 add 9283beb  [SPARK-35418][SQL] Add sentences function to 
functions.{scala,py}

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst   |  1 +
 python/pyspark/sql/functions.py| 39 ++
 python/pyspark/sql/functions.pyi   |  5 +++
 .../scala/org/apache/spark/sql/functions.scala | 19 +++
 .../apache/spark/sql/StringFunctionsSuite.scala|  7 
 5 files changed, 71 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3b859a1 -> 7b942d5)

2021-05-18 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b859a1  [SPARK-35431][SQL][TESTS] Sort elements generated by 
collect_set in SQLQueryTestSuite
 add 7b942d5  [SPARK-35425][BUILD] Pin jinja2 in `spark-rm/Dockerfile` and 
add as a required dependency in the release README.md

No new revisions were added by this update.

Summary of changes:
 dev/create-release/spark-rm/Dockerfile | 4 +++-
 docs/README.md | 5 -
 2 files changed, 7 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7c13636 -> b4348b7)

2021-05-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7c13636  [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting 
session window on elements
 add b4348b7  [SPARK-35420][BUILD] Replace the usage of toStringHelper with 
ToStringBuilder

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/RemoteBlockPushResolver.java |  8 +---
 .../network/shuffle/protocol/FinalizeShuffleMerge.java |  8 +---
 .../spark/network/shuffle/protocol/MergeStatuses.java  |  8 +---
 .../spark/network/shuffle/protocol/PushBlockStream.java| 14 --
 4 files changed, 23 insertions(+), 15 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 >

1 - 100 of 233 matches

Mail list logo