Re: Deluge of GitBox emails

2022-04-04 Thread Sean Owen
https://issues.apache.org/jira/browse/INFRA-23082 for those following.

On Mon, Apr 4, 2022 at 9:32 AM Nicholas Chammas 
wrote:

> I’m not familiar with GitBox, but it must be an independent thing. When
> you participate in a PR, GitHub emails you notifications directly.
>
> The GitBox emails, on the other hand, are going to the dev list. They seem
> like something setup as a repo-wide setting, or perhaps as an Apache bot
> that monitors repo activity and converts it into emails. (I’ve seen other
> projects -- I think Hadoop -- where GitHub activity is converted into
> comments on Jira.
>
> Turning off these GitBox emails should not have in impact on the usual
> GitHub emails we are all already familiar with.
>
>
> On Apr 4, 2022, at 9:47 AM, Sean Owen  wrote:
>
> I think this must be related to the Gitbox migration that just happened.
> It does seem like I'm getting more emails - some are on PRs I'm attached
> to, but some I don't recognize. The thing is, I'm not yet clear if they
> duplicate the normal Github emails - that is if we turn them off do we have
> anything?
>
> On Mon, Apr 4, 2022 at 8:44 AM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I assume I’m not the only one getting these new emails from GitBox. Is
>> there a story behind that that I missed?
>>
>> I’d rather not get these emails on the dev list. I assume most of the
>> list would agree with me.
>>
>> GitHub has a good set of options for following activity on the repo.
>> People who want to follow conversations can easily do that without
>> involving the whole dev list.
>>
>> Do we know who is responsible for these GitBox emails? Perhaps we need to
>> file an Apache INFRA ticket?
>>
>> Nick
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: Apache Spark 3.3 Release

2022-04-04 Thread Maxim Gekk
Hello All,

Below is current status of features from the allow list:

IN PROGRESS:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   7. SPARK-28516: Data Type Formatting Functions: `to_char`
   8. SPARK-36664: Log time spent waiting for cluster resources
   9. SPARK-34659: Web UI does not correctly get appId
   10. SPARK-37650: Tell spark-env.sh the python interpreter
   11. SPARK-38589: New SQL function: try_avg
   12. SPARK-38590: New SQL function: try_to_binary
   13. SPARK-34079: Improvement CTE table scan

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

We need to decide whether we are going to wait a little bit more or close
the doors.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk 
wrote:

> Hi All,
>
> Here is the allow list which I built based on your requests in this thread:
>
>1. SPARK-37396: Inline type hint files for files in
>python/pyspark/mllib
>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>3. SPARK-37093: Inline type hints python/pyspark/streaming
>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>deprecated usage of Distribution
>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>sources
>6. SPARK-32268: Bloom Filter Join
>7. SPARK-38548: New SQL function: try_sum
>8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>9. SPARK-38063: Support SQL split_part function
>10. SPARK-28516: Data Type Formatting Functions: `to_char`
>11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>filter by self way
>12. SPARK-34863: Support nested column in Spark Parquet vectorized
>readers
>13. SPARK-38194: Make Yarn memory overhead factor configurable
>14. SPARK-37618: Support cleaning up shuffle blocks from external
>shuffle service
>15. SPARK-37831: Add task partition id in metrics
>16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>17. SPARK-36664: Log time spent waiting for cluster resources
>18. SPARK-34659: Web UI does not correctly get appId
>19. SPARK-37650: Tell spark-env.sh the python interpreter
>20. SPARK-38589: New SQL function: try_avg
>21. SPARK-38590: New SQL function: try_to_binary
>22. SPARK-34079: Improvement CTE table scan
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:
>
>> Is the feature freeze target date March 22nd then?  I saw a few dates
>> thrown around want to confirm what we landed on
>>
>> I am trying to get the following improvements finished review and in, if
>> concerns with either, let me know:
>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>> 
>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>> for released executors 
>>
>> Tom
>>
>>
>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>> ltn...@gmail.com> wrote:
>>
>>
>> I'd like to add the following new SQL functions in the 3.3 release. These
>> functions are useful when overflow or encoding errors occur:
>>
>>- [SPARK-38548][SQL] New SQL function: try_sum
>>
>>- [SPARK-38589][SQL] New SQL function: try_avg
>>
>>- [SPARK-38590][SQL] New SQL function: try_to_binary
>>
>>
>> Gengliang
>>
>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo 
>> wrote:
>>
>> Hello,
>>
>> I've been trying for a bit to get the following two PRs merged and
>> into a release, and I'm having some difficulty moving them forward:
>>
>> https://github.com/apache/spark/pull/34903 - This passes the current
>> python interpreter to spark-env.sh to allow some currently-unavailable
>> customization to happen
>> 

Re: Deluge of GitBox emails

2022-04-04 Thread Nicholas Chammas
I’m not familiar with GitBox, but it must be an independent thing. When you 
participate in a PR, GitHub emails you notifications directly.

The GitBox emails, on the other hand, are going to the dev list. They seem like 
something setup as a repo-wide setting, or perhaps as an Apache bot that 
monitors repo activity and converts it into emails. (I’ve seen other projects 
-- I think Hadoop -- where GitHub activity is converted into comments on Jira.

Turning off these GitBox emails should not have in impact on the usual GitHub 
emails we are all already familiar with.


> On Apr 4, 2022, at 9:47 AM, Sean Owen  wrote:
> 
> I think this must be related to the Gitbox migration that just happened. It 
> does seem like I'm getting more emails - some are on PRs I'm attached to, but 
> some I don't recognize. The thing is, I'm not yet clear if they duplicate the 
> normal Github emails - that is if we turn them off do we have anything?
> 
> On Mon, Apr 4, 2022 at 8:44 AM Nicholas Chammas  > wrote:
> I assume I’m not the only one getting these new emails from GitBox. Is there 
> a story behind that that I missed?
> 
> I’d rather not get these emails on the dev list. I assume most of the list 
> would agree with me.
> 
> GitHub has a good set of options for following activity on the repo. People 
> who want to follow conversations can easily do that without involving the 
> whole dev list.
> 
> Do we know who is responsible for these GitBox emails? Perhaps we need to 
> file an Apache INFRA ticket?
> 
> Nick
> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 



Re: Deluge of GitBox emails

2022-04-04 Thread Паша
+1

Pasha Finkelshteyn

Developer Advocate for Data Engineering

JetBrains










Pasha Finkelshteyn

Developer Advocate for Data Engineering

JetBrains




asm0...@jetbrains.com
https://linktr.ee/asm0dey

Find out more



пн, 4 апр. 2022 г. в 16:51, Mich Talebzadeh :
>
> +1 as well receiving :)
>
>
>
>view my Linkedin profile
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>
>
>
>
>
> On Mon, 4 Apr 2022 at 14:48, Sean Owen  wrote:
>>
>> I think this must be related to the Gitbox migration that just happened. It 
>> does seem like I'm getting more emails - some are on PRs I'm attached to, 
>> but some I don't recognize. The thing is, I'm not yet clear if they 
>> duplicate the normal Github emails - that is if we turn them off do we have 
>> anything?
>>
>> On Mon, Apr 4, 2022 at 8:44 AM Nicholas Chammas  
>> wrote:
>>>
>>> I assume I’m not the only one getting these new emails from GitBox. Is 
>>> there a story behind that that I missed?
>>>
>>> I’d rather not get these emails on the dev list. I assume most of the list 
>>> would agree with me.
>>>
>>> GitHub has a good set of options for following activity on the repo. People 
>>> who want to follow conversations can easily do that without involving the 
>>> whole dev list.
>>>
>>> Do we know who is responsible for these GitBox emails? Perhaps we need to 
>>> file an Apache INFRA ticket?
>>>
>>> Nick
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Deluge of GitBox emails

2022-04-04 Thread Mich Talebzadeh
+1 as well receiving :)



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 4 Apr 2022 at 14:48, Sean Owen  wrote:

> I think this must be related to the Gitbox migration that just happened.
> It does seem like I'm getting more emails - some are on PRs I'm attached
> to, but some I don't recognize. The thing is, I'm not yet clear if they
> duplicate the normal Github emails - that is if we turn them off do we have
> anything?
>
> On Mon, Apr 4, 2022 at 8:44 AM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I assume I’m not the only one getting these new emails from GitBox. Is
>> there a story behind that that I missed?
>>
>> I’d rather not get these emails on the dev list. I assume most of the
>> list would agree with me.
>>
>> GitHub has a good set of options for following activity on the repo.
>> People who want to follow conversations can easily do that without
>> involving the whole dev list.
>>
>> Do we know who is responsible for these GitBox emails? Perhaps we need to
>> file an Apache INFRA ticket?
>>
>> Nick
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Deluge of GitBox emails

2022-04-04 Thread Sean Owen
I think this must be related to the Gitbox migration that just happened. It
does seem like I'm getting more emails - some are on PRs I'm attached to,
but some I don't recognize. The thing is, I'm not yet clear if they
duplicate the normal Github emails - that is if we turn them off do we have
anything?

On Mon, Apr 4, 2022 at 8:44 AM Nicholas Chammas 
wrote:

> I assume I’m not the only one getting these new emails from GitBox. Is
> there a story behind that that I missed?
>
> I’d rather not get these emails on the dev list. I assume most of the list
> would agree with me.
>
> GitHub has a good set of options for following activity on the repo.
> People who want to follow conversations can easily do that without
> involving the whole dev list.
>
> Do we know who is responsible for these GitBox emails? Perhaps we need to
> file an Apache INFRA ticket?
>
> Nick
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Deluge of GitBox emails

2022-04-04 Thread Nicholas Chammas
I assume I’m not the only one getting these new emails from GitBox. Is there a 
story behind that that I missed?

I’d rather not get these emails on the dev list. I assume most of the list 
would agree with me.

GitHub has a good set of options for following activity on the repo. People who 
want to follow conversations can easily do that without involving the whole dev 
list.

Do we know who is responsible for these GitBox emails? Perhaps we need to file 
an Apache INFRA ticket?

Nick


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a diff in pull request #36052: [SPARK-38777][YARN] Add `bin/spark-submit --kill / --status` support for yarn

2022-04-04 Thread GitBox


yaooqinn commented on code in PR #36052:
URL: https://github.com/apache/spark/pull/36052#discussion_r841461428


##
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkSubmitOperation.scala:
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.Map
+
+import org.apache.hadoop.yarn.api.records.{ApplicationId, ApplicationReport, 
YarnApplicationState}
+import org.apache.hadoop.yarn.client.api.YarnClient
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.{SparkHadoopUtil, SparkSubmitOperation}
+import org.apache.spark.deploy.yarn.YarnSparkSubmitOperation._
+import org.apache.spark.util.CommandLineLoggingUtils
+
+class YarnSparkSubmitOperation
+  extends SparkSubmitOperation with CommandLineLoggingUtils {
+
+  private def withYarnClient(conf: SparkConf)(f: YarnClient => Unit): Unit = {
+val yarnClient = YarnClient.createYarnClient
+try {
+  val hadoopConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(conf))
+  yarnClient.init(hadoopConf)
+  yarnClient.start()
+  f(yarnClient)
+} catch {
+  case e: Exception =>
+printErrorAndExit(s"Failed to initialize yarn client due to 
${e.getMessage}")
+} finally {
+  yarnClient.stop()
+}
+  }
+
+  override def kill(applicationId: String, conf: SparkConf): Unit = {
+withYarnClient(conf) { yarnClient =>
+  try {
+val appId = ApplicationId.fromString(applicationId)
+val report = yarnClient.getApplicationReport(appId)
+if (isTerminalState(report.getYarnApplicationState)) {
+  printMessage(s"WARN: Application $appId is already terminated")
+  printMessage(formatReportDetails(report))
+} else {
+  yarnClient.killApplication(appId)
+  val report = yarnClient.getApplicationReport(appId)
+  printMessage(formatReportDetails(report))
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] monkeyboy123 commented on a diff in pull request #35984: [MINOR][SQL] Show debug log for `AnalysisException` in Analyzer

2022-04-04 Thread GitBox


monkeyboy123 commented on code in PR #35984:
URL: https://github.com/apache/spark/pull/35984#discussion_r841430645


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 try {
   innerResolve(expr, isTopLevel = true)
 } catch {
-  case _: AnalysisException if !throws => expr
+  case ae: AnalysisException if !throws =>
+logWarning(ae.message)

Review Comment:
   It seems that the unit tests errors in CI is not related to this pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36038: [SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-04 Thread GitBox


HyukjinKwon commented on code in PR #36038:
URL: https://github.com/apache/spark/pull/36038#discussion_r841351936


##
python/docs/source/reference/pyspark.ss.rst:
##
@@ -30,10 +30,10 @@ Core Classes
 
 DataStreamReader
 DataStreamWriter
-ForeachBatchFunction

Review Comment:
   This was removed because this isn't an API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35856: [SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI

2022-04-04 Thread GitBox


dongjoon-hyun commented on code in PR #35856:
URL: https://github.com/apache/spark/pull/35856#discussion_r841422549


##
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
##
@@ -118,6 +119,12 @@ private[sql] class SharedState(
 statusStore
   }
 
+  sparkContext.statusStore.diskStore.foreach { kvStore =>
+sparkContext.listenerBus.addToQueue(
+  new DiagnosticListener(conf, kvStore.asInstanceOf[ElementTrackingStore]),

Review Comment:
   Why do we need to share the same kvStore?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35856: [SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI

2022-04-04 Thread GitBox


dongjoon-hyun commented on code in PR #35856:
URL: https://github.com/apache/spark/pull/35856#discussion_r841421049


##
core/src/main/scala/org/apache/spark/internal/config/Status.scala:
##
@@ -70,4 +70,11 @@ private[spark] object Status {
   .version("3.0.0")
   .booleanConf
   .createWithDefault(false)
+
+  val DISK_STORE_DIR_FOR_STATUS =
+ConfigBuilder("spark.appStatusStore.diskStore.dir")

Review Comment:
   If there is no other config, Apache Spark community's configuration naming 
guide is not to introduce a namespace by removing `.`. In this case,
   ```
   - spark.appStatusStore.diskStore.dir
   + spark.appStatusStore.diskStoreDir
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org