Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-15 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pkubernetes

Regards,
Mridul


On Sun, Apr 14, 2024 at 11:31 PM Dongjoon Hyun  wrote:

> I'll start with my +1.
>
> - Checked checksum and signature
> - Checked Scala/Java/R/Python/SQL Document's Spark version
> - Checked published Maven artifacts
> - All CIs passed.
>
> Thanks,
> Dongjoon.
>
> On 2024/04/15 04:22:26 Dongjoon Hyun wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 3.4.3.
> >
> > The vote is open until April 18th 1AM (PDT) and passes if a majority +1
> PMC
> > votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.4.3
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v3.4.3-rc2 (commit
> > 1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f)
> > https://github.com/apache/spark/tree/v3.4.3-rc2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1453/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-docs/
> >
> > The list of bug fixes going into 3.4.3 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353987
> >
> > This release is using the release script of the tag v3.4.3-rc2.
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.4.3?
> > ===
> >
> > The current list of open tickets targeted at 3.4.3 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > Version/s" = 3.4.3
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Versioning of Spark Operator

2024-04-09 Thread Mridul Muralidharan
  I am trying to understand if we can simply align with Spark's version for
this ?
Makes the release and jira management much more simpler for developers and
intuitive for users.

Regards,
Mridul


On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  wrote:

> Hi, Liang-Chi.
>
> Thank you for leading Apache Spark K8s operator as a shepherd.
>
> I took a look at `Apache Spark Connect Go` repo mentioned in the thread.
> Sadly, there is no release at all and no activity since last 6 months. It
> seems to be the first time for Apache Spark community to consider these
> sister repositories (Go and K8s Operator).
>
> https://github.com/apache/spark-connect-go/commits/master/
>
> Dongjoon.
>
> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> > Hi all,
> >
> > We've opened the dedicated repository of Spark Kubernetes Operator,
> > and the first PR is created.
> > Thank you for the review from the community so far.
> >
> > About the versioning of Spark Operator, there are questions.
> >
> > As we are using Spark JIRA, when we are going to merge PRs, we need to
> > choose a Spark version. However, the Spark Operator is versioning
> > differently than Spark. I'm wondering how we deal with this?
> >
> > Not sure if Connect also has its versioning different to Spark? If so,
> > maybe we can follow how Connect does.
> >
> > Can someone who is familiar with Connect versioning give some
> suggestions?
> >
> > Thank you.
> >
> > Liang-Chi
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Mridul Muralidharan
Hi Dongjoon,

  Thanks for volunteering !
I would suggest to wait for SPARK-47318 to be merged as well for 3.4

Regards,
Mridul

On Sat, Apr 6, 2024 at 6:49 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Apache Spark 3.4.2 tag was created on Nov 24th and `branch-3.4` has 85
> commits including important security and correctness patches like
> SPARK-45580, SPARK-46092, SPARK-46466, SPARK-46794, and SPARK-46862.
>
> https://github.com/apache/spark/releases/tag/v3.4.2
>
> $ git log --oneline v3.4.2..HEAD | wc -l
>   85
>
> SPARK-45580 Subquery changes the output schema of the outer query
> SPARK-46092 Overflow in Parquet row group filter creation causes incorrect
> results
> SPARK-46466 Vectorized parquet reader should never do rebase for timestamp
> ntz
> SPARK-46794 Incorrect results due to inferred predicate from checkpoint
> with subquery
> SPARK-46862 Incorrect count() of a dataframe loaded from CSV datasource
> SPARK-45445 Upgrade snappy to 1.1.10.5
> SPARK-47428 Upgrade Jetty to 9.4.54.v20240208
> SPARK-46239 Hide `Jetty` info
>
>
> Currently, I'm checking more applicable patches for branch-3.4. I'd like
> to propose to release Apache Spark 3.4.3 and volunteer as the release
> manager for Apache Spark 3.4.3. If there are no additional blockers, the
> first tentative RC1 vote date is April 15th (Monday).
>
> WDYT?
>
>
> Dongjoon.
>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Mridul Muralidharan
+1

Regards,
Mridul


On Mon, Apr 1, 2024 at 11:26 PM Holden Karau  wrote:

> +1
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Mon, Apr 1, 2024 at 5:44 PM Xinrong Meng  wrote:
>
>> +1
>>
>> Thank you @Hyukjin Kwon 
>>
>> On Mon, Apr 1, 2024 at 10:19 AM Felix Cheung 
>> wrote:
>>
>>> +1
>>> --
>>> *From:* Denny Lee 
>>> *Sent:* Monday, April 1, 2024 10:06:14 AM
>>> *To:* Hussein Awala 
>>> *Cc:* Chao Sun ; Hyukjin Kwon ;
>>> Mridul Muralidharan ; dev 
>>> *Subject:* Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)
>>>
>>> +1 (non-binding)
>>>
>>>
>>> On Mon, Apr 1, 2024 at 9:24 AM Hussein Awala  wrote:
>>>
>>> +1(non-binding) I add to the difference will it make that it will also
>>> simplify package maintenance and easily release a bug fix/new feature
>>> without needing to wait for Pyspark to release.
>>>
>>> On Mon, Apr 1, 2024 at 4:56 PM Chao Sun  wrote:
>>>
>>> +1
>>>
>>> On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Oh I didn't send the discussion thread out as it's pretty simple,
>>> non-invasive and the discussion was sort of done as part of the Spark
>>> Connect initial discussion ..
>>>
>>> On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan 
>>> wrote:
>>>
>>>
>>> Can you point me to the SPIP’s discussion thread please ?
>>> I was not able to find it, but I was on vacation, and so might have
>>> missed this …
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee
>>>  wrote:
>>>
>>> +1
>>>
>>> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>>> Connect)
>>>
>>> JIRA <https://issues.apache.org/jira/browse/SPARK-47540>
>>> Prototype <https://github.com/apache/spark/pull/45053>
>>> SPIP doc
>>> <https://docs.google.com/document/d/1Pund40wGRuB72LX6L7cliMDVoXTPR-xx4IkPmMLaZXk/edit?usp=sharing>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks.
>>>
>>>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Mridul Muralidharan
Can you point me to the SPIP’s discussion thread please ?
I was not able to find it, but I was on vacation, and so might have missed
this …


Regards,
Mridul

On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee
 wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>> Connect)
>>
>> JIRA 
>> Prototype 
>> SPIP doc
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>


Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-03-18 Thread Mridul Muralidharan
Hi Ashish,

  This is something we are still actively working on internally, but is
unfortunately not yet in a state to share widely yet.

Regards,
Mridul

On Mon, Mar 11, 2024 at 6:23 PM Ashish Singh  wrote:

> Hi Kalyan,
>
> Is this something you are still interested in pursuing? There are some
> open discussion threads on the doc you shared.
>
> @Mridul Muralidharan  In what state are your efforts
> along this? Is it something that your team is actively pursuing/ building
> or are mostly planning right now? Asking so that we can align efforts on
> this.
>
> On Sun, Feb 18, 2024 at 10:32 PM xiaoping.huang <1754789...@qq.com> wrote:
>
>> Hi all,
>> Any updates on this project? This will be a very useful feature.
>>
>> xiaoping.huang
>> 1754789...@qq.com
>>
>>  Replied Message ----
>> From kalyan 
>> Date 02/6/2024 10:08
>> To Jay Han 
>> Cc Ashish Singh ,
>>  Mridul Muralidharan ,
>>  dev ,
>>  
>> 
>> Subject Re: [Spark-Core] Improving Reliability of spark when Executors
>> OOM
>> Hey,
>> Disk space not enough is also a reliability concern, but might need a
>> diff strategy to handle it.
>> As suggested by Mridul, I am working on making things more configurable
>> in another(new) module… with that, we can plug in new rules for each type
>> of error.
>>
>> Regards
>> Kalyan.
>>
>> On Mon, 5 Feb 2024 at 1:10 PM, Jay Han  wrote:
>>
>>> Hi,
>>> what about supporting for solving the disk space problem of "device
>>> space isn't enough"? I think it's same as OOM exception.
>>>
>>> kalyan  于2024年1月27日周六 13:00写道:
>>>
>>>> Hi all,
>>>>
>>>
>>>> Sorry for the delay in getting the first draft of (my first) SPIP out.
>>>>
>>>> https://docs.google.com/document/d/1hxEPUirf3eYwNfMOmUHpuI5dIt_HJErCdo7_yr9htQc/edit?pli=1
>>>>
>>>> Let me know what you think.
>>>>
>>>> Regards
>>>> kalyan.
>>>>
>>>> On Sat, Jan 20, 2024 at 8:19 AM Ashish Singh  wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> Thanks for this discussion, the timing of this couldn't be better!
>>>>>
>>>>> At Pinterest, we recently started to look into reducing OOM failures
>>>>> while also reducing memory consumption of spark applications. We 
>>>>> considered
>>>>> the following options.
>>>>> 1. Changing core count on executor to change memory available per task
>>>>> in the executor.
>>>>> 2. Changing resource profile based on task failures and gc metrics to
>>>>> grow or shrink executor memory size. We do this at application level based
>>>>> on the app's past runs today.
>>>>> 3. K8s vertical pod autoscaler
>>>>> <https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler>
>>>>>
>>>>> Internally, we are mostly getting aligned on option 2. We would love
>>>>> to make this happen and are looking forward to the SPIP.
>>>>>
>>>>>
>>>>> On Wed, Jan 17, 2024 at 9:34 AM Mridul Muralidharan 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>   We are internally exploring adding support for dynamically changing
>>>>>> the resource profile of a stage based on runtime characteristics.
>>>>>> This includes failures due to OOM and the like, slowness due to
>>>>>> excessive GC, resource wastage due to excessive overprovisioning, etc.
>>>>>> Essentially handles scale up and scale down of resources.
>>>>>> Instead of baking these into the scheduler directly (which is already
>>>>>> complex), we are modeling it as a plugin - so that the 'business logic' 
>>>>>> of
>>>>>> how to handle task events and mutate state is pluggable.
>>>>>>
>>>>>> The main limitation I find with mutating only the cores is the limits
>>>>>> it places on what kind of problems can be solved with it - and mutating
>>>>>> resource profiles is a much more natural way to handle this
>>>>>> (spark.task.cpus predates RP).
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>> On Wed, Jan 17, 2024 at 9:18 AM Tom 

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mridul Muralidharan
  I am supportive of the proposal - this is a step in the right direction !
Additional metadata (explicit and inferred) for log records, and exposing
them for indexing is extremely useful.

The specifics of the API still need some work IMO and does not need to be
this disruptive, but I consider that is orthogonal to this vote itself -
and something we need to iterate upon during PR reviews.

+1

Regards,
Mridul


On Mon, Mar 11, 2024 at 11:09 AM Mich Talebzadeh 
wrote:

> +1
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Mon, 11 Mar 2024 at 09:27, Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Mon, 11 Mar 2024 at 18:11, yangjie01 
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> Jie Yang
>>>
>>>
>>>
>>> *发件人**: *Haejoon Lee 
>>> *日期**: *2024年3月11日 星期一 17:09
>>> *收件人**: *Gengliang Wang 
>>> *抄送**: *dev 
>>> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Structured Logging Framework for
>>> Apache Spark
>>>
>>>
>>> References:
>>>
>>>- JIRA ticket
>>>
>>> 
>>>- SPIP doc
>>>
>>> 
>>>- Discussion thread
>>>
>>> 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks!
>>>
>>> Gengliang Wang
>>>
>>>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mridul Muralidharan
Hi Gengling,

  Thanks for sharing this !
I added a few queries to the proposal doc, and we can continue discussing
there, but overall I am in favor of this.

Regards,
Mridul


On Fri, Mar 1, 2024 at 1:35 AM Gengliang Wang  wrote:

> Hi All,
>
> I propose to enhance our logging system by transitioning to structured
> logs. This initiative is designed to tackle the challenges of analyzing
> distributed logs from drivers, workers, and executors by allowing them to
> be queried using a fixed schema. The goal is to improve the informativeness
> and accessibility of logs, making it significantly easier to diagnose
> issues.
>
> Key benefits include:
>
>- Clarity and queryability of distributed log files.
>- Continued support for log4j, allowing users to switch back to
>traditional text logging if preferred.
>
> The improvement will simplify debugging and enhance productivity without
> disrupting existing logging practices. The implementation is estimated to
> take around 3 months.
>
> *SPIP*:
> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
> *JIRA*: SPARK-47240 
>
> Your comments and feedback would be greatly appreciated.
>


Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Mridul Muralidharan
Hi,

  We are internally exploring adding support for dynamically changing the
resource profile of a stage based on runtime characteristics.
This includes failures due to OOM and the like, slowness due to excessive
GC, resource wastage due to excessive overprovisioning, etc.
Essentially handles scale up and scale down of resources.
Instead of baking these into the scheduler directly (which is already
complex), we are modeling it as a plugin - so that the 'business logic' of
how to handle task events and mutate state is pluggable.

The main limitation I find with mutating only the cores is the limits it
places on what kind of problems can be solved with it - and mutating
resource profiles is a much more natural way to handle this
(spark.task.cpus predates RP).

Regards,
Mridul

On Wed, Jan 17, 2024 at 9:18 AM Tom Graves 
wrote:

> It is interesting. I think there are definitely some discussion points
> around this.  reliability vs performance is always a trade off and its
> great it doesn't fail but if it doesn't meet someone's SLA now that could
> be as bad if its hard to figure out why.   I think if something like this
> kicks in, it needs to be very obvious to the user so they can see that it
> occurred.  Do you have something in place on UI or something that indicates
> this? The nice thing is also you aren't wasting memory by increasing it for
> all tasks when maybe you only need it for one or two.  The downside is you
> are only finding out after failure.
>
> I do also worry a little bit that in your blog post, the error you pointed
> out isn't a java OOM but an off heap memory issue (overhead + heap usage).
> You don't really address heap memory vs off heap in that article.  Only
> thing I see mentioned is spark.executor.memory which is heap memory.
> Obviously adjusting to only run one task is going to give that task more
> overall memory but the reasons its running out in the first place could be
> different.  If it was on heap memory for instance with more tasks I would
> expect to see more GC and not executor OOM.  If you are getting executor
> OOM you are likely using more off heap memory/stack space, etc then you
> allocated.   Ultimately it would be nice to know why that is happening and
> see if we can address it to not fail in the first place.  That could be
> extremely difficult though, especially if using software outside Spark that
> is using that memory.
>
> As Holden said,  we need to make sure this would play nice with the
> resource profiles, or potentially if we can use the resource profile
> functionality.  Theoretically you could extend this to try to get new
> executor if using dynamic allocation for instance.
>
> I agree doing a SPIP would be a good place to start to have more
> discussions.
>
> Tom
>
> On Wednesday, January 17, 2024 at 12:47:51 AM CST, kalyan <
> justfors...@gmail.com> wrote:
>
>
> Hello All,
>
> At Uber, we had recently, done some work on improving the reliability of
> spark applications in scenarios of fatter executors going out of memory and
> leading to application failure. Fatter executors are those that have more
> than 1 task running on it at a given time concurrently. This has
> significantly improved the reliability of many spark applications for us at
> Uber. We made a blog about this recently. Link:
> https://www.uber.com/en-US/blog/dynamic-executor-core-resizing-in-spark/
>
> At a high level, we have done the below changes:
>
>1. When a Task fails with the OOM of an executor, we update the core
>requirements of the task to max executor cores.
>2. When the task is picked for rescheduling, the new attempt of the
>task happens to be on an executor where no other task can run concurrently.
>All cores get allocated to this task itself.
>3. This way we ensure that the configured memory is completely at the
>disposal of a single task. Thus eliminating contention of memory.
>
> The best part of this solution is that it's reactive. It kicks in only
> when the executors fail with the OOM exception.
>
> We understand that the problem statement is very common and we expect our
> solution to be effective in many cases.
>
> There could be more cases that can be covered. Executor failing with OOM
> is like a hard signal. The framework(making the driver aware of
> what's happening with the executor) can be extended to handle scenarios of
> other forms of memory pressure like excessive spilling to disk, etc.
>
> While we had developed this on Spark 2.4.3 in-house, we would like to
> collaborate and contribute this work to the latest versions of Spark.
>
> What is the best way forward here? Will an SPIP proposal to detail the
> changes help?
>
> Regards,
> Kalyan.
> Uber India.
>


Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Mridul Muralidharan
I am seeing a bunch of python related (43) failures in the sql module (for
example [1]) ... I am currently on Python 3.11.6, java 8.
Not sure if ubuntu modified anything from under me, thoughts ?

I am currently testing this against an older branch to make sure it is not
an issue with my desktop.

Regards,
Mridul


[1]


org.apache.spark.sql.IntegratedUDFTestUtils.shouldTestGroupedAggPandasUDFs
was false (QueryCompilationErrorsSuite.scala:112)
Traceback (most recent call last):
  File "/home/mridul/work/apache/vote/spark/python/pyspark/serializers.py",
line 458, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
   ^^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 73, in dumps
cp.dump(obj)
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 602, in dump
return Pickler.dump(self, obj)
   ^^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 692, in reducer_override
return self._function_reduce(obj)
   ^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 565, in _function_reduce
return self._dynamic_function_reduce(obj)
   ^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 546, in _dynamic_function_reduce
state = _function_getstate(func)

  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 157, in _function_getstate
f_globals_ref = _extract_code_globals(func.__code__)

  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle.py",
line 334, in _extract_code_globals
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle.py",
line 334, in 
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
 ~^^^
IndexError: tuple index out of range
Traceback (most recent call last):
  File "/home/mridul/work/apache/vote/spark/python/pyspark/serializers.py",
line 458, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
   ^^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 73, in dumps
cp.dump(obj)
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 602, in dump
return Pickler.dump(self, obj)
   ^^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 692, in reducer_override
return self._function_reduce(obj)
   ^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 565, in _function_reduce
return self._dynamic_function_reduce(obj)
   ^^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 546, in _dynamic_function_reduce
state = _function_getstate(func)

  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle_fast.py",
line 157, in _function_getstate
f_globals_ref = _extract_code_globals(func.__code__)

  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle.py",
line 334, in _extract_code_globals
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
^
  File
"/home/mridul/work/apache/vote/spark/python/pyspark/cloudpickle/cloudpickle.py",
line 334, in 
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
 ~^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 1, in 
  File "/home/mridul/work/apache/vote/spark/python/pyspark/serializers.py",
line 468, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index
out of range
- UNSUPPORTED_FEATURE: Using Python UDF with unsupported join condition ***
FAILED ***



On Sun, Dec 10, 2023 at 9:05 PM L. C. Hsieh  wrote:

> +1
>
> On Sun, Dec 10, 2023 at 6:15 PM Kent Yao  wrote:
> >
> > +1(non-binding
> >
> > Kent Yao
> >
> > Yuming Wang  于2023年12月11日周一 09:33写道:
> > >
> > > +1
> > >
> > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun 
> wrote:
> > >>
> 

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Mon, Dec 4, 2023 at 11:40 AM L. C. Hsieh  wrote:

> +1
>
> Thanks Dongjoon!
>
> On Mon, Dec 4, 2023 at 9:26 AM Yang Jie  wrote:
> >
> > +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> >
> > Jie Yang
> >
> > On 2023/12/04 15:08:25 Tom Graves wrote:
> > >  +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> > > Tom
> > > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> > >
> > >  Hi, All.
> > >
> > > Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022,
> branch-3.3 has been maintained and served well until now.
> > >
> > > - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun
> 9th, 2022)
> > > - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm
> (vote result on June 14th, 2022)
> > >
> > > As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged
> on Aug 3rd about 4 month ago) and reaches the end-of-life this month
> according to the Apache Spark release cadence,
> https://spark.apache.org/versioning-policy.html .
> > >
> > > $ git log --oneline v3.3.3..HEAD | wc -l
> > > 56
> > >
> > > Along with the recent Apache Spark 3.4.2 release, I hope the users can
> get a chance to have these last bits of Apache Spark 3.3.x, and I'd like to
> propose to have Apache Spark 3.3.4 EOL Release vote on December 11th and
> volunteer as the release manager.
> > >
> > > WDTY?
> > >
> > > Please let us know if you need more patches on branch-3.3.
> > >
> > > Thanks,
> > > Dongjoon.
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-29 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Wed, Nov 29, 2023 at 5:08 AM Yang Jie  wrote:

> +1(non-binding)
>
> Jie Yang
>
> On 2023/11/29 02:08:04 Kent Yao wrote:
> > +1(non-binding)
> >
> > Kent Yao
> >
> > On 2023/11/27 01:12:53 Dongjoon Hyun wrote:
> > > Hi, Marc.
> > >
> > > Given that it exists in 3.4.0 and 3.4.1, I don't think it's a release
> > > blocker for Apache Spark 3.4.2.
> > >
> > > When the patch is ready, we can consider it for 3.4.3.
> > >
> > > In addition, note that we categorized release-blocker-level issues by
> > > marking 'Blocker' priority with `Target Version` before the vote.
> > >
> > > Best,
> > > Dongjoon.
> > >
> > >
> > > On Sat, Nov 25, 2023 at 12:01 PM Marc Le Bihan 
> wrote:
> > >
> > > > -1 If you can wait that the last remaining problem with Generics (?)
> is
> > > > entirely solved, that causes this exception to be thrown :
> > > >
> > > > java.lang.ClassCastException: class [Ljava.lang.Object; cannot be
> cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and
> [Ljava.lang.reflect.TypeVariable; are in module java.base of loader
> 'bootstrap')
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.$anonfun$encoderFor$1(JavaTypeInference.scala:140)
> > > > at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:929)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:138)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:60)
> > > > at
> org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:53)
> > > > at
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
> > > > at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
> > > > at org.apache.spark.sql.Encoders.bean(Encoders.scala)
> > > >
> > > >
> > > > https://issues.apache.org/jira/browse/SPARK-45311
> > > >
> > > > Thanks !
> > > >
> > > > Marc Le Bihan
> > > >
> > > >
> > > > On 25/11/2023 11:48, Dongjoon Hyun wrote:
> > > >
> > > > Please vote on releasing the following candidate as Apache Spark
> version
> > > > 3.4.2.
> > > >
> > > > The vote is open until November 30th 1AM (PST) and passes if a
> majority +1
> > > > PMC votes are cast, with a minimum of 3 +1 votes.
> > > >
> > > > [ ] +1 Release this package as Apache Spark 3.4.2
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > To learn more about Apache Spark, please see
> https://spark.apache.org/
> > > >
> > > > The tag to be voted on is v3.4.2-rc1 (commit
> > > > 0c0e7d4087c64efca259b4fb656b8be643be5686)
> > > > https://github.com/apache/spark/tree/v3.4.2-rc1
> > > >
> > > > The release files, including signatures, digests, etc. can be found
> at:
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-bin/
> > > >
> > > > Signatures used for Spark RCs can be found in this file:
> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS
> > > >
> > > > The staging repository for this release can be found at:
> > > >
> https://repository.apache.org/content/repositories/orgapachespark-1450/
> > > >
> > > > The documentation corresponding to this release can be found at:
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.2-rc1-docs/
> > > >
> > > > The list of bug fixes going into 3.4.2 can be found at the following
> URL:
> > > > https://issues.apache.org/jira/projects/SPARK/versions/12353368
> > > >
> > > > This release is using the release script of the tag v3.4.2-rc1.
> > > >
> > > > FAQ
> > > >
> > > > =
> > > > How can I help test this release?
> > > > =
> > > >
> > > > If you are a Spark user, you can help us test this release by taking
> > > > an existing Spark workload and running on this release candidate,
> then
> > > > reporting any regressions.
> > > >
> > > > If you're working in PySpark you can set up a virtual env and install
> > > > the current RC and see if anything important breaks, in the
> Java/Scala
> > > > you can add the staging repository to your projects resolvers and
> test
> > > > with the RC (make sure to clean up the artifact cache before/after so
> > > > you don't end up building with a out of date RC going forward).
> > > >
> > > > ===
> > > > What should happen to JIRA tickets still targeting 3.4.2?
> > > > ===
> > > >
> > > > The current list of open tickets targeted at 3.4.2 can be found at:
> > > > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > > > Version/s" = 3.4.2
> > > >
> > > > Committers should look at those and triage. Extremely important bug
> > > > fixes, documentation, and API tweaks that impact 

Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files

2023-11-24 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Fri, Nov 24, 2023 at 8:21 AM Kent Yao  wrote:

> Hi Spark Dev,
>
> Following the discussion [1], I'd like to start the vote for the SPIP [2].
>
> The SPIP aims to improve the test coverage and develop experience for
> Spark UI-related javascript codes.
>
> This thread will be open for at least the next 72 hours.  Please vote
> accordingly,
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
>
> Thank you!
> Kent Yao
>
> [1] https://lists.apache.org/thread/5rqrho4ldgmqlc173y2229pfll5sgkff
> [2]
> https://docs.google.com/document/d/1hWl5Q2CNNOjN5Ubyoa28XmpJtDyD9BtGtiEG2TT94rg/edit?usp=sharing
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPIP: Testing Framework for Spark UI Javascript files

2023-11-21 Thread Mridul Muralidharan
This should be a very good addition !

Regards,
Mridul

On Tue, Nov 21, 2023 at 7:46 PM Dongjoon Hyun 
wrote:

> Thank you for proposing a new UI test framework for Apache Spark 4.0.
>
> It looks very useful.
>
> Thanks,
> Dongjoon.
>
>
> On Tue, Nov 21, 2023 at 1:51 AM Kent Yao  wrote:
>
>> Hi Spark Dev,
>>
>> This is a call to discuss a new SPIP: Testing Framework for
>> Spark UI Javascript files [1]. The SPIP aims to improve the test
>> coverage and develop experience for Spark UI-related javascript
>> codes.
>> The Jest [2], a JavaScript Testing Framework licensed under MIT, will
>> be used to build this dev and test-only module.
>> There is also a W.I.P. pull request [3] to show what it would be like.
>>
>> This thread will be open for at least the next 72 hours. Suggestions
>> are welcome.If there is no veto found, I will close this thread after
>> 2023-11-24 18:00(+08:00) and raise a new thread for voting.
>>
>> Thanks,
>> Kent Yao
>>
>> [1]
>> https://docs.google.com/document/d/1hWl5Q2CNNOjN5Ubyoa28XmpJtDyD9BtGtiEG2TT94rg/edit?usp=sharing
>> [2] https://github.com/jestjs/jest
>> [3] https://github.com/apache/spark/pull/43903
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Tue, Nov 14, 2023 at 12:45 PM Holden Karau  wrote:

> +1
>
> On Tue, Nov 14, 2023 at 10:21 AM DB Tsai  wrote:
>
>> +1
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov <
>> vakaris.bashki...@gmail.com> wrote:
>>
>> +1 (non-binding)
>>
>>
>> On Tue, Nov 14, 2023 at 8:03 PM Chao Sun  wrote:
>>
>>> +1
>>>
>>> On Tue, Nov 14, 2023 at 9:52 AM L. C. Hsieh  wrote:
>>> >
>>> > +1
>>> >
>>> > On Tue, Nov 14, 2023 at 9:46 AM Ye Zhou  wrote:
>>> > >
>>> > > +1(Non-binding)
>>> > >
>>> > > On Tue, Nov 14, 2023 at 9:42 AM L. C. Hsieh 
>>> wrote:
>>> > >>
>>> > >> Hi all,
>>> > >>
>>> > >> I’d like to start a vote for SPIP: An Official Kubernetes Operator
>>> for
>>> > >> Apache Spark.
>>> > >>
>>> > >> The proposal is to develop an official Java-based Kubernetes
>>> operator
>>> > >> for Apache Spark to automate the deployment and simplify the
>>> lifecycle
>>> > >> management and orchestration of Spark applications and Spark
>>> clusters
>>> > >> on k8s at prod scale.
>>> > >>
>>> > >> This aims to reduce the learning curve and operation overhead for
>>> > >> Spark users so they can concentrate on core Spark logic.
>>> > >>
>>> > >> Please also refer to:
>>> > >>
>>> > >>- Discussion thread:
>>> > >> https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz
>>> > >>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-45923
>>> > >>- SPIP doc:
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>> > >>
>>> > >>
>>> > >> Please vote on the SPIP for the next 72 hours:
>>> > >>
>>> > >> [ ] +1: Accept the proposal as an official SPIP
>>> > >> [ ] +0
>>> > >> [ ] -1: I don’t think this is a good idea because …
>>> > >>
>>> > >>
>>> > >> Thank you!
>>> > >>
>>> > >> Liang-Chi Hsieh
>>> > >>
>>> > >>
>>> -
>>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > >
>>> > > Zhou, Ye  周晔
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>


Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Mridul Muralidharan
  Congratulations !
Looking forward to more exciting contributions :-)

Regards,
Mridul

On Tue, Oct 3, 2023 at 2:51 AM Hussein Awala  wrote:

> Congrats to all of you!
>
> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>
>> Congratulations! Well deserved!
>>
>> -Rui
>>
>>
>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>
>>> Congratulations to all! Well deserved!
>>>
>>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>>
 Hi all,

 The Spark PMC is delighted to announce that we have voted to add one
 new committer and two new PMC members. These individuals have consistently
 contributed to the project and have clearly demonstrated their expertise.

 New Committer:
 - Jiaan Geng (focusing on Spark Connect and Spark SQL)

 New PMCs:
 - Yuanjian Li
 - Yikun Jiang

 Please join us in extending a warm welcome to them in their new roles!

 Sincerely,
 The Spark PMC

>>>


Re: Migrating the Junit framework used in Apache Spark 4.0 from 4.x to 5.x

2023-09-26 Thread Mridul Muralidharan
+1 for moving to a newer version.
Thanks for driving this Jie Yang !

Regards,
Mridul


On Mon, Sep 25, 2023 at 10:15 AM 杨杰  wrote:

> Hi all,
>
> In SPARK-44170 (apache/spark#43074 [1]), I’m trying to migrate the Junit
> test framework used in Spark 4.0 from Junit4 to Junit5.
>
>
> Although this involves a fair amount of code modifications, given that
> Junit 4 is still developed based on Java 6 source code and it hasn't
> released a new version for over two years (the Junit 4.13.2 that Spark is
> currently using was released on February 14, 2021.), I personally believe
> it's worth it.
>
> Feel free to comment if you have any concerns.
>
> [1] https://github.com/apache/spark/pull/43074
>
> Thanks,
> Jie Yang
>


Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-10 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li  wrote:

> Please vote on releasing the following candidate(RC5) as Apache Spark
> version 3.5.0.
>
> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.5.0-rc5 (commit
> ce5ddad990373636e94071e7cef2f31021add07b):
>
> https://github.com/apache/spark/tree/v3.5.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1449
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>
> The list of bug fixes going into 3.5.0 can be found at the following URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
> This release is using the release script of the tag v3.5.0-rc5.
>
>
> FAQ
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
> Thanks,
>
> Yuanjian Li
>


Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Wed, Aug 30, 2023 at 6:10 AM yangjie01 
wrote:

> Hi, Sean
>
>
>
> I have performed testing with Java 17 and Scala 2.13 using maven (`mvn
> clean install` and `mvn package test`), and have not encountered the issue
> you mentioned.
>
>
>
> The test for the connect module depends on the `spark-protobuf` module to
> complete the `package,` was it successful? Or could you provide the test
> command for me to verify?
>
>
>
> Thanks,
>
> Jie Yang
>
>
>
> *发件人**: *Dipayan Dev 
> *日期**: *2023年8月30日 星期三 17:01
> *收件人**: *Sean Owen 
> *抄送**: *Yuanjian Li , Spark dev list <
> dev@spark.apache.org>
> *主题**: *Re: [VOTE] Release Apache Spark 3.5.0 (RC3)
>
>
>
> Can we fix this bug in Spark 3.5.0?
>
> https://issues.apache.org/jira/browse/SPARK-44884
> 
>
>
>
>
> On Wed, Aug 30, 2023 at 11:51 AM Sean Owen  wrote:
>
> It looks good except that I'm getting errors running the Spark Connect
> tests at the end (Java 17, Scala 2.13) It looks like I missed something
> necessary to build; is anyone getting this?
>
>
>
> [ERROR] [Error]
> /tmp/spark-3.5.0/connector/connect/server/target/generated-test-sources/protobuf/java/org/apache/spark/sql/protobuf/protos/TestProto.java:9:46:
>  error: package org.sparkproject.spark_protobuf.protobuf does not exist
>
>
>
> On Tue, Aug 29, 2023 at 11:25 AM Yuanjian Li 
> wrote:
>
> Please vote on releasing the following candidate(RC3) as Apache Spark
> version 3.5.0.
>
>
>
> The vote is open until 11:59pm Pacific time *Aug 31st* and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
>
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
> 
>
>
>
> The tag to be voted on is v3.5.0-rc3 (commit
> 9f137aa4dc43398aafa0c3e035ed3174182d7d6c):
>
> https://github.com/apache/spark/tree/v3.5.0-rc3
> 
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc3-bin/
> 
>
>
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1447
> 
>
>
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc3-docs/
> 
>
>
>
> The list of bug fixes going into 3.5.0 can be found at the following URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
> 
>
>
>
> This release is using the release script of the tag v3.5.0-rc3.
>
>
>
> FAQ
>
>
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
>
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
>
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK
> 
>  and
> search for "Target Version/s" = 3.5.0
>
>
>
> Committers should look at those and triage. 

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-11 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Fri, Aug 11, 2023 at 2:00 AM Cheng Pan  wrote:

> +1 (non-binding)
>
> Passed integration test with Apache Kyuubi.
>
> Thanks for driving this release.
>
> Thanks,
> Cheng Pan
>
>
> > On Aug 11, 2023, at 06:36, L. C. Hsieh  wrote:
> >
> > +1
> >
> > Thanks Yuming.
> >
> > On Thu, Aug 10, 2023 at 3:24 PM Dongjoon Hyun 
> wrote:
> >>
> >> +1
> >>
> >> Dongjoon
> >>
> >> On 2023/08/10 07:14:07 yangjie01 wrote:
> >>> +1
> >>> Thanks, Jie Yang
> >>>
> >>>
> >>> 发件人: Yuming Wang 
> >>> 日期: 2023年8月10日 星期四 13:33
> >>> 收件人: Dongjoon Hyun 
> >>> 抄送: dev 
> >>> 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1)
> >>>
> >>> +1 myself.
> >>>
> >>> On Tue, Aug 8, 2023 at 12:41 AM Dongjoon Hyun  > wrote:
> >>> Thank you, Yuming.
> >>>
> >>> Dongjoon.
> >>>
> >>> On Mon, Aug 7, 2023 at 9:30 AM yangjie01  yangji...@baidu.com>> wrote:
> >>> HI,Dongjoon and Yuming
> >>>
> >>> I submitted a PR a few days ago to try to fix this issue:
> https://github.com/apache/spark/pull/42167<
> https://mailshield.baidu.com/check?q=zJC5kBC6NRCGy3lXApap3GX6%2bKB9Gi%2b%2fTr0LBfwtxiuVHIiRznzQ7iofG2KJFsJB>.
> The reason for the failure is that the branch daily test and the master use
> the same yml file.
> >>>
> >>> Jie Yang
> >>>
> >>> 发件人: Dongjoon Hyun  dongjoon.h...@gmail.com>>
> >>> 日期: 2023年8月8日 星期二 00:18
> >>> 收件人: Yuming Wang mailto:yumw...@apache.org>>
> >>> 抄送: dev mailto:dev@spark.apache.org>>
> >>> 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1)
> >>>
> >>> Hi, Yuming.
> >>>
> >>> One of the community GitHub Action test pipelines is unhealthy
> consistently due to Python mypy linter.
> >>>
> >>> https://github.com/apache/spark/actions/workflows/build_branch33.yml<
> https://mailshield.baidu.com/check?q=zL6yo8WBsL15wzkqifGHCZlkv7KqucJxpuNp8neenIT6Re6167OIO8%2fCYlTH0k%2b29wZ%2fDuFIdfwQCHRIDBzTS292DGk6EvIh
> >
> >>>
> >>> It seems due to the pipeline difference between the same Python mypy
> linter already pass in commit build,
> >>>
> >>> Dongjoon.
> >>>
> >>>
> >>> On Fri, Aug 4, 2023 at 8:09 PM Yuming Wang  yumw...@apache.org>> wrote:
> >>> Please vote on releasing the following candidate as Apache Spark
> version 3.3.3.
> >>>
> >>> The vote is open until 11:59pm Pacific time August 10th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.3.3
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see https://spark.apache.org<
> https://mailshield.baidu.com/check?q=cUpKoLnajWahunV4UDIAXHiHyx%2f5wSVGtwwdag%3d%3d
> >
> >>>
> >>> The tag to be voted on is v3.3.3-rc1 (commit
> 8c2b3319c6734250ff9d72f3d7e5cab56b142195):
> >>> https://github.com/apache/spark/tree/v3.3.3-rc1<
> https://mailshield.baidu.com/check?q=8FCIKpLCdZkaDTtrM2i6z6MozYaNPIUxXbtoz6UY4Dd9HDZ%2fGD1yoiMERdI6DE0Tv%2bgl0w%3d%3d
> >
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.3-rc1-bin<
> https://mailshield.baidu.com/check?q=E6K9wCUIl7R2GWg35cz6FTdyOlAIldH1DzrC5lMm5vEz7tsnGbtOoOh3Xhjgt%2bKmRTfJyMzbsWs8FQuvjrnyEw%3d%3d
> >
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS<
> https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d
> >
> >>>
> >>> The staging repository for this release can be found at:
> >>> https://repository.apache.org/content/repositories/orgapachespark-1445
> <
> https://mailshield.baidu.com/check?q=qwIV%2bgL7su%2fhDHaSq3L7D4SvWg6hop35lQ6SmnXKIqkCT%2b5Z2apQOzuDyyPx6aoUTTbwled13%2b5ajYiObU6S6Fie%2bMXccPyMOLOrKg%3d%3d
> >
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.3-rc1-docs<
> https://mailshield.baidu.com/check?q=8J9mpKGDzLZWyCARq00pdYmMTZ7Xg2gOIhMdnfDmdhOphsDhxGAe3BboUHQltnOgRUrIx2ycA8%2b%2fDX2SG1gd6g%3d%3d
> >
> >>>
> >>> The list of bug fixes going into 3.3.3 can be found at the following
> URL:
> >>> https://s.apache.org/rjci4<
> https://mailshield.baidu.com/check?q=CDSiusCyO4bcrg80RMEGb9gnL5P2xcxAWMuq6OOUhbc%3d
> >
> >>>
> >>> This release is using the release script of the tag v3.3.3-rc1.
> >>>
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>> If you are a Spark user, you can help us test this release by taking
> >>> an existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> >>> the current RC and see if anything important breaks, in the Java/Scala
> >>> you can add the staging repository to your projects resolvers and test
> 

Re: [ANNOUNCE] Apache Spark 3.4.1 released

2023-06-23 Thread Mridul Muralidharan
Thanks Dongjoon !

Regards,
Mridul

On Fri, Jun 23, 2023 at 6:58 PM Dongjoon Hyun  wrote:

> We are happy to announce the availability of Apache Spark 3.4.1!
>
> Spark 3.4.1 is a maintenance release containing stability fixes. This
> release is based on the branch-3.4 maintenance branch of Spark. We strongly
> recommend all 3.4 users to upgrade to this stable release.
>
> To download Spark 3.4.1, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-4-1.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
>
> Dongjoon Hyun
>


Re: [VOTE][RESULT] Release Spark 3.4.1 (RC1)

2023-06-23 Thread Mridul Muralidharan
A late +1 from me too … forgot to send this yesterday :-)

Regards,
Mridul

On Fri, Jun 23, 2023 at 3:20 AM Dongjoon Hyun  wrote:

> The vote passes with 15 +1s (10 binding +1s).
> Thanks to all who helped with the release!
>
> (* = binding)
> +1:
> - Jia Fan
> - Dongjoon Hyun *
> - Liang-Chi Hsieh *
> - Yang Jie
> - Hyukjin Kwon *
> - Huaxin Gao *
> - Ruifeng Zheng *
> - Peter Toth
> - Xinrong Meng *
> - Jacek Laskowski
> - Yuming Wang *
> - Chao Sun *
> - Fokko Driesprong
> - Gengliang Wang *
> - Thomas Graves *
>
> +0: None
>
> -1: None
>


Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Mridul Muralidharan
I agree with Holden, we should have some understanding of what we are
targeting for 4.0, given it is a major ver bump - and work from there on
the release date.

Regards,
Mridul

On Mon, Jun 12, 2023 at 8:53 PM Jia Fan  wrote:

> By the way, like Holden said, what's big feature for 4.0.0? I think very
> big version change always bring some different.
>
> Jia Fan  于2023年6月13日周二 08:25写道:
>
>> +1
>>
>> 
>>
>> Jia Fan
>>
>>
>>
>> 2023年6月13日 03:51,Chao Sun  写道:
>>
>> +1
>>
>> On Mon, Jun 12, 2023 at 12:50 PM kazuyuki tanimura
>>  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Thank you!
>>> Kazu
>>>
>>>
>>> On Jun 12, 2023, at 11:32 AM, Holden Karau  wrote:
>>>
>>> -0
>>>
>>> I'd like to see more of a doc around what we're planning on for a 4.0
>>> before we pick a target release date etc. (feels like cart before the
>>> horse).
>>>
>>> But it's a weak preference.
>>>
>>> On Mon, Jun 12, 2023 at 11:24 AM Xiao Li  wrote:
>>>
 Thanks for starting the vote.

 I do have a concern about the target release date of Spark 4.0.

 L. C. Hsieh  于2023年6月12日周一 11:09写道:

> +1
>
> On Mon, Jun 12, 2023 at 11:06 AM huaxin gao 
> wrote:
> >
> > +1
> >
> > On Mon, Jun 12, 2023 at 11:05 AM Dongjoon Hyun 
> wrote:
> >>
> >> +1
> >>
> >> Dongjoon
> >>
> >> On 2023/06/12 18:00:38 Dongjoon Hyun wrote:
> >> > Please vote on the release plan for Apache Spark 4.0.0.
> >> >
> >> > The vote is open until June 16th 1AM (PST) and passes if a
> majority +1 PMC
> >> > votes are cast, with a minimum of 3 +1 votes.
> >> >
> >> > [ ] +1 Have a release plan for Apache Spark 4.0.0 (June 2024)
> >> > [ ] -1 Do not have a plan for Apache Spark 4.0.0 because ...
> >> >
> >> > ===
> >> > Apache Spark 4.0.0 Release Plan
> >> > ===
> >> >
> >> > 1. After creating `branch-3.5`, set "4.0.0-SNAPSHOT" in master
> branch.
> >> >
> >> > 2. Creating `branch-4.0` on April 1st, 2024.
> >> >
> >> > 3. Apache Spark 4.0.0 RC1 on May 1st, 2024.
> >> >
> >> > 4. Apache Spark 4.0.0 Release in June, 2024.
> >> >
> >>
> >>
> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>


Re: Apache Spark 3.4.1 Release?

2023-06-09 Thread Mridul Muralidharan
+1, thanks Dongjoon !

Regards,
Mridul

On Thu, Jun 8, 2023 at 7:16 PM Jia Fan  wrote:

> +1
>
> 
>
>
> Jia Fan
>
>
>
> 2023年6月9日 08:00,Yuming Wang  写道:
>
> +1.
>
> On Fri, Jun 9, 2023 at 7:14 AM Chao Sun  wrote:
>
>> +1 too
>>
>> On Thu, Jun 8, 2023 at 2:34 PM kazuyuki tanimura
>>  wrote:
>> >
>> > +1 (non-binding), Thank you Dongjoon
>> >
>> > Kazu
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Mon, Apr 10, 2023 at 10:34 AM huaxin gao  wrote:

> +1
>
> On Mon, Apr 10, 2023 at 8:17 AM Chao Sun  wrote:
>
>> +1 (non-binding)
>>
>> On Mon, Apr 10, 2023 at 7:07 AM yangjie01  wrote:
>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> *发件人**: *Sean Owen 
>>> *日期**: *2023年4月10日 星期一 21:19
>>> *收件人**: *Dongjoon Hyun 
>>> *抄送**: *"dev@spark.apache.org" 
>>> *主题**: *Re: [VOTE] Release Apache Spark 3.2.4 (RC1)
>>>
>>>
>>>
>>> +1 from me
>>>
>>>
>>>
>>> On Sun, Apr 9, 2023 at 7:19 PM Dongjoon Hyun 
>>> wrote:
>>>
>>> I'll start with my +1.
>>>
>>> I verified the checksum, signatures of the artifacts, and documentations.
>>> Also, ran the tests with YARN and K8s modules.
>>>
>>> Dongjoon.
>>>
>>> On 2023/04/09 23:46:10 Dongjoon Hyun wrote:
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version
>>> > 3.2.4.
>>> >
>>> > The vote is open until April 13th 1AM (PST) and passes if a majority
>>> +1 PMC
>>> > votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 3.2.4
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see https://spark.apache.org/
>>> 
>>> >
>>> > The tag to be voted on is v3.2.4-rc1 (commit
>>> > 0ae10ac18298d1792828f1d59b652ef17462d76e)
>>> > https://github.com/apache/spark/tree/v3.2.4-rc1
>>> 
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.2.4-rc1-bin/
>>> 
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> 
>>> >
>>> > The staging repository for this release can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1442/
>>> 
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.2.4-rc1-docs/
>>> 
>>> >
>>> > The list of bug fixes going into 3.2.4 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12352607
>>> 
>>> >
>>> > This release is using the release script of the tag v3.2.4-rc1.
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 3.2.4?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 3.2.4 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK
>>> 
>>> and search for "Target
>>> > Version/s" = 3.2.4
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted 

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-08 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Sat, Apr 8, 2023 at 12:13 PM L. C. Hsieh  wrote:

> +1
>
> Thanks Xinrong.
>
> On Sat, Apr 8, 2023 at 8:23 AM yangjie01  wrote:
> >
> > +1
> >
> >
> >
> > 发件人: Sean Owen 
> > 日期: 2023年4月8日 星期六 20:27
> > 收件人: Xinrong Meng 
> > 抄送: dev 
> > 主题: Re: [VOTE] Release Apache Spark 3.4.0 (RC7)
> >
> >
> >
> > +1 form me, same result as last time.
> >
> >
> >
> > On Fri, Apr 7, 2023 at 6:30 PM Xinrong Meng 
> wrote:
> >
> > Please vote on releasing the following candidate(RC7) as Apache Spark
> version 3.4.0.
> >
> > The vote is open until 11:59pm Pacific time April 12th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.4.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v3.4.0-rc7 (commit
> 87a5442f7ed96b11051d8a9333476d080054e5a0):
> > https://github.com/apache/spark/tree/v3.4.0-rc7
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1441
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/
> >
> > The list of bug fixes going into 3.4.0 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12351465
> >
> > This release is using the release script of the tag v3.4.0-rc7.
> >
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.4.0?
> > ===
> > The current list of open tickets targeted at 3.4.0 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.0
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> > Thanks,
> > Xinrong Meng
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread Mridul Muralidharan
+1
Sounds good to me.

Thanks,
Mridul


On Tue, Apr 4, 2023 at 1:39 PM huaxin gao  wrote:

> +1
>
> On Tue, Apr 4, 2023 at 11:17 AM Chao Sun  wrote:
>
>> +1
>>
>> On Tue, Apr 4, 2023 at 11:12 AM Holden Karau 
>> wrote:
>>
>>> +1
>>>
>>> On Tue, Apr 4, 2023 at 11:04 AM L. C. Hsieh  wrote:
>>>
 +1

 Sounds good and thanks Dongjoon for driving this.

 On 2023/04/04 17:24:54 Dongjoon Hyun wrote:
 > Hi, All.
 >
 > Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021,
 branch-3.2
 > has been maintained and served well until now.
 >
 > - https://github.com/apache/spark/releases/tag/v3.2.0 (tagged on Oct
 6,
 > 2021)
 > - https://lists.apache.org/thread/jslhkh9sb5czvdsn7nz4t40xoyvznlc7
 >
 > As of today, branch-3.2 has 62 additional patches after v3.2.3 and
 reaches
 > the end-of-life this month according to the Apache Spark release
 cadence. (
 > https://spark.apache.org/versioning-policy.html)
 >
 > $ git log --oneline v3.2.3..HEAD | wc -l
 > 62
 >
 > With the upcoming Apache Spark 3.4, I hope the users can get a chance
 to
 > have these last bits of Apache Spark 3.2.x, and I'd like to propose
 to have
 > Apache Spark 3.2.4 EOL Release next week and volunteer as the release
 > manager. WDTY? Please let me know if you need more patches on
 branch-3.2.
 >
 > Thanks,
 > Dongjoon.
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: Slack for PySpark users

2023-03-30 Thread Mridul Muralidharan
Thanks for flagging the concern Dongjoon, I was not aware of the discussion
- but I can understand the concern.
Would be great if you or Matei could update the thread on the result of
deliberations, once it reaches a logical consensus: before we set up
official policy around it.

Regards,
Mridul


On Thu, Mar 30, 2023 at 4:23 PM Bjørn Jørgensen 
wrote:

> I like the idea of having a talk channel. It can make it easier for
> everyone to say hello. Or to dare to ask about small or big matters that
> you would not have dared to ask about before on mailing lists.
> But then there is the price and what is the best for an open source
> project.
>
> The price for using slack is expensive.
> Right now for those that have join spark slack
> $8.75 USD
> 72 members
> 1 month
> $630 USD
>
> https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2
>
> And they - slack does not have an option for open source projects.
>
> There seems to be some alternatives for open source software. I have not
> tried it.
> Like https://www.rocket.chat/blog/slack-open-source-alternatives
>
> [image: image.png]
>
> rocket chat is open source https://github.com/RocketChat/Rocket.Chat
>
> tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh <
> mich.talebza...@gmail.com>:
>
>> Hi Dongjoon
>>
>> to your points if I may
>>
>> - Do you have any reference from other official ASF-related Slack
>> channels?
>>No, I don't have any reference from other official ASF-related Slack
>> channels because I don't think that matters. However, I stand corrected
>> - To be clear, I intentionally didn't refer to any specific mailing list
>> because we didn't set up any rule here yet.
>>fair enough
>>
>> going back to your original point
>>
>> ..There is a concern expressed by ASF board because recent Slack
>> activities created an isolated silo outside of ASF mailing list archive...
>> Well, there are activities on Spark and indeed other open source software
>> everywhere. One way or other they do help getting community (inside the
>> user groups and other) to get interested and involved. Slack happens to be
>> one of them.
>> I am of the opinion that creating such silos is already a reality and we
>> ought to be pragmatic. Unless there is an overriding reason, we should
>> embrace it as slack can co-exist with the other mailing lists and channels
>> like linkedin etc.
>>
>> Hope this clarifies my position
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 30 Mar 2023 at 17:28, Dongjoon Hyun 
>> wrote:
>>
>>> To Mich.
>>> - Do you have any reference from other official ASF-related Slack
>>> channels?
>>> - To be clear, I intentionally didn't refer to any specific mailing list
>>> because we didn't set up any rule here yet.
>>>
>>> To Xiao. I understand what you mean. That's the reason why I added Matei
>>> from your side.
>>> > I did not see an objection from the ASF board.
>>>
>>> There is on-going discussion about the communication channels outside
>>> ASF email which is specifically concerning Slack.
>>> Please hold on any official action for this topic. We will know how to
>>> support it seamlessly.
>>>
>>> Dongjoon.
>>>
>>>
>>> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>>>
 Hi, Dongjoon,

 The other communities (e.g., Pinot, Druid, Flink) created their own
 Slack workspaces last year. I did not see an objection from the ASF board.
 At the same time, Slack workspaces are very popular and useful in most
 non-ASF open source communities. TBH, we are kind of late. I think we can
 do the same in our community?

 We can follow the guide when the ASF has an official process for ASF
 archiving. Since our PMC are the owner of the slack workspace, we can make
 a change based on the policy. WDYT?

 Xiao


 Dongjoon Hyun  于2023年3月30日周四 09:03写道:

> Hi, Xiao and all.
>
> (cc Matei)
>
> Please hold on the vote.
>
> There is a concern expressed by ASF board because recent Slack
> activities created an isolated silo outside of ASF mailing list archive.
>
> We need to establish a way to embrace it back to ASF archive before
> starting anything official.
>
> Bests,
> Dongjoon.
>
>
>
> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:
>
>> +1
>>
>> + @dev@spark.apache.org 
>>
>> This is a good 

Re: Ammonite as REPL for Spark Connect

2023-03-23 Thread Mridul Muralidharan
Sounds good, thanks for clarifying !

Regards,
Mridul

On Thu, Mar 23, 2023 at 9:09 AM Herman van Hovell 
wrote:

> The goal of adding this, is to make it easy for a user to connect a scala
> REPL to a Spark Connect server. Just like Spark shell makes it easy to work
> with a regular Spark environment.
>
> It is not meant as a Spark shell replacement. They represent two different
> modes of working with Spark, and they have very different API surfaces
> (Connect being a subset of what regular Spark has to offer). I do think we
> should consider using ammonite for Spark shell at some point, since this
> has better UX and does not require us to fork a REPL. That discussion is
> for another day though.
>
> I guess you can use it as an example of building an integration. In itself
> I wouldn't call it that because I think this a key part of getting started
> with connect, and/or doing debugging.
>
> On Thu, Mar 23, 2023 at 4:00 AM Mridul Muralidharan 
> wrote:
>
>>
>> What is unclear to me is why we are introducing this integration, how
>> users will leverage it.
>>
>> * Are we replacing spark-shell with it ?
>> Given the existing gaps, this is not the case.
>>
>> * Is it an example to showcase how to build an integration ?
>> That could be interesting, and we can add it to external/
>>
>> Anything else I am missing ?
>>
>> Regards,
>> Mridul
>>
>>
>>
>> On Wed, Mar 22, 2023 at 6:58 PM Herman van Hovell 
>> wrote:
>>
>>> Ammonite is maintained externally by Li Haoyi et al. We are including it
>>> as a 'provided' dependency. The integration bits and pieces (1 file) are
>>> included in Apache Spark.
>>>
>>> On Wed, Mar 22, 2023 at 7:53 PM Mridul Muralidharan 
>>> wrote:
>>>
>>>>
>>>> Will this be maintained externally or included into Apache Spark ?
>>>>
>>>> Regards ,
>>>> Mridul
>>>>
>>>>
>>>>
>>>> On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell
>>>>  wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> For Spark Connect Scala Client we are working on making the REPL
>>>>> experience a bit nicer <https://github.com/apache/spark/pull/40515>.
>>>>> In a nutshell we want to give users a turn key scala REPL, that works even
>>>>> if you don't have a Spark distribution on your machine (through
>>>>> coursier <https://get-coursier.io/>). We are using Ammonite
>>>>> <https://ammonite.io/> instead of the standard scala REPL for this,
>>>>> the main reason for going with Ammonite is that it is easier to customize,
>>>>> and IMO has a superior user experience.
>>>>>
>>>>> Does anyone object to doing this?
>>>>>
>>>>> Kind regards,
>>>>> Herman
>>>>>
>>>>>


Re: Ammonite as REPL for Spark Connect

2023-03-23 Thread Mridul Muralidharan
What is unclear to me is why we are introducing this integration, how users
will leverage it.

* Are we replacing spark-shell with it ?
Given the existing gaps, this is not the case.

* Is it an example to showcase how to build an integration ?
That could be interesting, and we can add it to external/

Anything else I am missing ?

Regards,
Mridul



On Wed, Mar 22, 2023 at 6:58 PM Herman van Hovell 
wrote:

> Ammonite is maintained externally by Li Haoyi et al. We are including it
> as a 'provided' dependency. The integration bits and pieces (1 file) are
> included in Apache Spark.
>
> On Wed, Mar 22, 2023 at 7:53 PM Mridul Muralidharan 
> wrote:
>
>>
>> Will this be maintained externally or included into Apache Spark ?
>>
>> Regards ,
>> Mridul
>>
>>
>>
>> On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell
>>  wrote:
>>
>>> Hi All,
>>>
>>> For Spark Connect Scala Client we are working on making the REPL
>>> experience a bit nicer <https://github.com/apache/spark/pull/40515>. In
>>> a nutshell we want to give users a turn key scala REPL, that works even if
>>> you don't have a Spark distribution on your machine (through coursier
>>> <https://get-coursier.io/>). We are using Ammonite
>>> <https://ammonite.io/> instead of the standard scala REPL for this, the
>>> main reason for going with Ammonite is that it is easier to customize, and
>>> IMO has a superior user experience.
>>>
>>> Does anyone object to doing this?
>>>
>>> Kind regards,
>>> Herman
>>>
>>>


Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Mridul Muralidharan
Will this be maintained externally or included into Apache Spark ?

Regards ,
Mridul



On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell
 wrote:

> Hi All,
>
> For Spark Connect Scala Client we are working on making the REPL
> experience a bit nicer . In a
> nutshell we want to give users a turn key scala REPL, that works even if
> you don't have a Spark distribution on your machine (through coursier
> ). We are using Ammonite 
> instead of the standard scala REPL for this, the main reason for going with
> Ammonite is that it is easier to customize, and IMO has a superior user
> experience.
>
> Does anyone object to doing this?
>
> Kind regards,
> Herman
>
>


Re: [VOTE] Release Apache Spark 3.4.0 (RC3)

2023-03-10 Thread Mridul Muralidharan
Other than the tag issue, the sigs/artifacts/build/etc worked for me.
So the next RC candidate looks promising !

Regards,
Mridul


On Thu, Mar 9, 2023 at 5:07 PM Xinrong Meng 
wrote:

> Thank you Hyukjin! :)
>
> I would prefer to cut v3.4.0-rc4 now if there are no objections.
>
> On Fri, Mar 10, 2023 at 7:01 AM Hyukjin Kwon  wrote:
>
>> BTW doing another RC isn't a very big deal (compared to what I did before
>> :-) ) since it's not a canonical release yet.
>>
>> On Fri, Mar 10, 2023 at 7:58 AM Hyukjin Kwon  wrote:
>>
>>> I guess directly tagging is fine too I guess.
>>> I don't mind cutting the RC4 right away either if that's what you prefer.
>>>
>>> On Fri, Mar 10, 2023 at 7:06 AM Xinrong Meng 
>>> wrote:
>>>
 Hi All,

 Thank you all for catching that. Unfortunately, the release script
 failed to push the release tag v3.4.0-rc3 to branch-3.4. Sorry about the
 issue.

 Shall we cut v3.4.0-rc4 immediately or wait until March 14th?

 On Fri, Mar 10, 2023 at 5:34 AM Sean Owen  wrote:

> If the issue were just tags, then you can simply delete the tag and
> re-tag the right commit. That doesn't change a commit log.
> But is the issue that the relevant commits aren't in branch-3.4? Like
> I don't see the usual release commits in
> https://github.com/apache/spark/commits/branch-3.4
> Yeah OK that needs a re-do.
>
> We can still test this release.
> It works for me, except that I still get the weird
> infinite-compile-loop issue that doesn't seem to be related to Spark. The
> Spark Connect parts seem to work.
>
> On Thu, Mar 9, 2023 at 3:25 PM Dongjoon Hyun 
> wrote:
>
>> No~ We cannot in the AS-IS commit log status because it's screwed
>> already as Emil wrote.
>> Did you check the branch-3.2 commit log, Sean?
>>
>> Dongjoon.
>>
>>
>> On Thu, Mar 9, 2023 at 11:42 AM Sean Owen  wrote:
>>
>>> We can just push the tags onto the branches as needed right? No need
>>> to roll a new release
>>>
>>> On Thu, Mar 9, 2023, 1:36 PM Dongjoon Hyun 
>>> wrote:
>>>
 Yes, I also confirmed that the v3.4.0-rc3 tag is invalid.

 I guess we need RC4.

 Dongjoon.

 On Thu, Mar 9, 2023 at 7:13 AM Emil Ejbyfeldt
  wrote:

> It might being caused by the v3.4.0-rc3 tag not being part of the
> 3.4
> branch branch-3.4:
>
> $ git log --pretty='format:%d %h' --graph origin/branch-3.4
> v3.4.0-rc3
> | head -n 10
> *  (HEAD, origin/branch-3.4) e38e619946
> *  f3e69a1fe2
> *  74cf1a32b0
> *  0191a5bde0
> *  afced91348
> | *  (tag: v3.4.0-rc3) b9be9ce15a
> |/
> *  006e838ede
> *  fc29b07a31
> *  8655dfe66d
>
>
> Best,
> Emil
>
> On 09/03/2023 15:50, yangjie01 wrote:
> > HI, all
> >
> > I can't git check out the tag of v3.4.0-rc3. At the same time,
> there is
> > the following information on the Github page.
> >
> > Does anyone else have the same problem?
> >
> > Yang Jie
> >
> > *发件人**: *Xinrong Meng 
> > *日期**: *2023年3月9日星期四20:05
> > *收件人**: *dev 
> > *主题**: *[VOTE] Release Apache Spark 3.4.0 (RC3)
> >
> > Please vote on releasing the following candidate(RC3) as Apache
> Spark
> > version 3.4.0.
> >
> > The vote is open until 11:59pm Pacific time *March 14th* and
> passes if a
> > majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.4.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> http://spark.apache.org/
> > <
> https://mailshield.baidu.com/check?q=eJcUboQ1HRRomPZKEwRzpl69wA8DbI%2fNIiRNsQ%3d%3d
> >
> >
> > The tag to be voted on is *v3.4.0-rc3* (commit
> > b9be9ce15a82b18cca080ee365d308c0820a29a9):
> > https://github.com/apache/spark/tree/v3.4.0-rc3
> > <
> https://mailshield.baidu.com/check?q=ScnsHLDD3dexVfW9cjs3GovMbG2LLAZqBLq9cA8V%2fTOpCQ1LdeNWoD0%2fy7eVo%2b3de8Rk%2bQ%3d%3d
> >
> >
> > The release files, including signatures, digests, etc. can be
> found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc3-bin/
> > <
> https://mailshield.baidu.com/check?q=U%2fLs35p0l%2bUUTclb%2blAPSYb%2bALxMfer1Jc%2b3i965Bjh2CxHpG45RFLW0NqSwMx00Ci3MRMz%2b7mTmcKUIa27Pww%3d%3d
> >
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> > <
> 

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Mridul Muralidharan
Signatures, digests, etc check out fine - thanks for updating them !
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes


The test ClientE2ETestSuite.simple udf failed [1] in "Connect Client "
module ... yet to test "Spark Protobuf" module due to the failure.


Regards,
Mridul

[1]

- simple udf *** FAILED ***

  io.grpc.StatusRuntimeException: INTERNAL:
org.apache.spark.sql.ClientE2ETestSuite

  at io.grpc.Status.asRuntimeException(Status.java:535)

  at
io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)

  at org.apache.spark.sql.connect.client.SparkResult.org
$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:50)

  at
org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:95)

  at
org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:112)

  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2037)

  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2267)

  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2036)

  at
org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$5(ClientE2ETestSuite.scala:65)

  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

  ...





On Wed, Feb 22, 2023 at 2:07 AM Mridul Muralidharan 
wrote:

>
> Thanks Xinrong !
> The signature verifications are fine now ... will continue with testing
> the release.
>
>
> Regards,
> Mridul
>
>
> On Wed, Feb 22, 2023 at 1:27 AM Xinrong Meng 
> wrote:
>
>> Hi Mridul,
>>
>> Would you please try that again? It should work now.
>>
>> On Wed, Feb 22, 2023 at 2:04 PM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Hi Xinrong,
>>>
>>>   Was it signed with the same key as present in KEYS [1] ?
>>> I am seeing errors with gpg when validating. For example:
>>>
>>>
>>> $ gpg --verify pyspark-3.4.0.tar.gz.asc
>>>
>>> gpg: assuming signed data in 'pyspark-3.4.0.tar.gz'
>>>
>>> gpg: Signature made Tue 21 Feb 2023 05:56:05 AM CST
>>>
>>> gpg:using RSA key
>>> CC68B3D16FE33A766705160BA7E57908C7A4E1B1
>>>
>>> gpg:issuer "xinr...@apache.org"
>>>
>>> gpg: Can't check signature: No public key
>>>
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>> [1] https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>>
>>> On Tue, Feb 21, 2023 at 10:36 PM Xinrong Meng 
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.4.0.
>>>>
>>>> The vote is open until 11:59pm Pacific time *February 27th* and passes
>>>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.4.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is *v3.4.0-rc1* (commit
>>>> e2484f626bb338274665a49078b528365ea18c3b):
>>>> https://github.com/apache/spark/tree/v3.4.0-rc1
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1435
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-docs/
>>>>
>>>> The list of bug fixes going into 3.4.0 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>>>>
>>>> This release is using the release script of the tag v3.4.0-rc1.
>>>>
>>>>
>>>> FAQ
>>>>
>>>> =
>>>> How can I help test this release?
>>>> =
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark yo

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Mridul Muralidharan
Thanks Xinrong !
The signature verifications are fine now ... will continue with testing the
release.


Regards,
Mridul


On Wed, Feb 22, 2023 at 1:27 AM Xinrong Meng 
wrote:

> Hi Mridul,
>
> Would you please try that again? It should work now.
>
> On Wed, Feb 22, 2023 at 2:04 PM Mridul Muralidharan 
> wrote:
>
>>
>> Hi Xinrong,
>>
>>   Was it signed with the same key as present in KEYS [1] ?
>> I am seeing errors with gpg when validating. For example:
>>
>>
>> $ gpg --verify pyspark-3.4.0.tar.gz.asc
>>
>> gpg: assuming signed data in 'pyspark-3.4.0.tar.gz'
>>
>> gpg: Signature made Tue 21 Feb 2023 05:56:05 AM CST
>>
>> gpg:using RSA key
>> CC68B3D16FE33A766705160BA7E57908C7A4E1B1
>>
>> gpg:issuer "xinr...@apache.org"
>>
>> gpg: Can't check signature: No public key
>>
>>
>>
>> Regards,
>> Mridul
>>
>> [1] https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>>
>> On Tue, Feb 21, 2023 at 10:36 PM Xinrong Meng 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.4.0.
>>>
>>> The vote is open until 11:59pm Pacific time *February 27th* and passes
>>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.4.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v3.4.0-rc1* (commit
>>> e2484f626bb338274665a49078b528365ea18c3b):
>>> https://github.com/apache/spark/tree/v3.4.0-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1435
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-docs/
>>>
>>> The list of bug fixes going into 3.4.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>>>
>>> This release is using the release script of the tag v3.4.0-rc1.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.4.0?
>>> ===
>>> The current list of open tickets targeted at 3.4.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.4.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> Thanks,
>>> Xinrong Meng
>>>
>>


Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-21 Thread Mridul Muralidharan
Hi Xinrong,

  Was it signed with the same key as present in KEYS [1] ?
I am seeing errors with gpg when validating. For example:


$ gpg --verify pyspark-3.4.0.tar.gz.asc

gpg: assuming signed data in 'pyspark-3.4.0.tar.gz'

gpg: Signature made Tue 21 Feb 2023 05:56:05 AM CST

gpg:using RSA key CC68B3D16FE33A766705160BA7E57908C7A4E1B1

gpg:issuer "xinr...@apache.org"

gpg: Can't check signature: No public key



Regards,
Mridul

[1] https://dist.apache.org/repos/dist/dev/spark/KEYS


On Tue, Feb 21, 2023 at 10:36 PM Xinrong Meng 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.4.0.
>
> The vote is open until 11:59pm Pacific time *February 27th* and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v3.4.0-rc1* (commit
> e2484f626bb338274665a49078b528365ea18c3b):
> https://github.com/apache/spark/tree/v3.4.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1435
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-docs/
>
> The list of bug fixes going into 3.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>
> This release is using the release script of the tag v3.4.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.4.0?
> ===
> The current list of open tickets targeted at 3.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Xinrong Meng
>


Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-11 Thread Mridul Muralidharan
Looks like it was an issue with wget not fetching all the artifacts, my bad
!

Looks good to me, +1 for release - thanks !


Regards,
Mridul


On Sat, Feb 11, 2023 at 12:11 PM L. C. Hsieh  wrote:

> Hi Mridul,
>
> Thanks for testing it.
>
> I can see the artifact in
>
> https://repository.apache.org/content/repositories/orgapachespark-1433/org/apache/spark/spark-mllib-local_2.13/3.3.2/
> .
> Did I miss something?
>
> Liang-Chi
>
> On Sat, Feb 11, 2023 at 10:08 AM Mridul Muralidharan 
> wrote:
> >
> >
> > Hi,
> >
> > The following file is missing in the staging repository - there is a
> corresponding asc sig file, without the artifact.
> > *
> org/apache/spark/spark-mllib-local_2.13/3.3.2/spark-mllib-local_2.13-3.3.2-test-sources.jar
> > Can we have this fixed please ?
> >
> > Rest of the signatures, digests, etc check out fine.
> >
> > Built and tested with "-Phive -Pyarn -Pmesos -Pkubernetes".
> >
> > Regards,
> > Mridul
> >
> >
> >
> >
> > On Fri, Feb 10, 2023 at 11:01 PM L. C. Hsieh  wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.3.2.
> >>
> >> The vote is open until Feb 15th 9AM (PST) and passes if a majority +1
> >> PMC votes are cast, with a minimum of 3 +1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.3.2
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org/
> >>
> >> The tag to be voted on is v3.3.2-rc1 (commit
> >> 5103e00c4ce5fcc4264ca9c4df12295d42557af6):
> >> https://github.com/apache/spark/tree/v3.3.2-rc1
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.2-rc1-bin/
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1433/
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.2-rc1-docs/
> >>
> >> The list of bug fixes going into 3.3.2 can be found at the following
> URL:
> >> https://issues.apache.org/jira/projects/SPARK/versions/12352299
> >>
> >> This release is using the release script of the tag v3.3.2-rc1.
> >>
> >> FAQ
> >>
> >> =
> >> How can I help test this release?
> >> =
> >>
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC and see if anything important breaks, in the Java/Scala
> >> you can add the staging repository to your projects resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.3.2?
> >> ===
> >>
> >> The current list of open tickets targeted at 3.3.2 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> >> Version/s" = 3.3.2
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >>
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something which is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>


Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-11 Thread Mridul Muralidharan
Hi,

The following file is missing in the staging repository - there is a
corresponding asc sig file, without the artifact.
*
org/apache/spark/spark-mllib-local_2.13/3.3.2/spark-mllib-local_2.13-3.3.2-test-sources.jar
Can we have this fixed please ?

Rest of the signatures, digests, etc check out fine.

Built and tested with "-Phive -Pyarn -Pmesos -Pkubernetes".

Regards,
Mridul




On Fri, Feb 10, 2023 at 11:01 PM L. C. Hsieh  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.3.2.
>
> The vote is open until Feb 15th 9AM (PST) and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.3.2-rc1 (commit
> 5103e00c4ce5fcc4264ca9c4df12295d42557af6):
> https://github.com/apache/spark/tree/v3.3.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1433/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.2-rc1-docs/
>
> The list of bug fixes going into 3.3.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12352299
>
> This release is using the release script of the tag v3.3.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.2?
> ===
>
> The current list of open tickets targeted at 3.3.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Time for Spark 3.4.0 release?

2023-01-04 Thread Mridul Muralidharan
+1, Thanks !

Regards,
Mridul

On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang  wrote:

> +1, thanks for driving the release!
>
>
> Gengliang
>
> On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Thank you!
>>
>> Dongjoon
>>
>> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang  wrote:
>>
>>> +1 to cut the branch starting from a workday!
>>>
>>> Great to see this is happening!
>>>
>>> Thanks Xinrong!
>>>
>>> -Rui
>>>
>>> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
>>> wrote:
>>>
 +1, thank you Xinrong for driving this release!

 --
 Ruifeng Zheng
 ruife...@foxmail.com

 



 -- Original --
 *From:* "Hyukjin Kwon" ;
 *Date:* Wed, Jan 4, 2023 01:15 PM
 *To:* "Xinrong Meng";
 *Cc:* "dev";
 *Subject:* Re: Time for Spark 3.4.0 release?

 SGTM +1

 On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
 wrote:

> Hi All,
>
> Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed
> January 15th per
> https://spark.apache.org/versioning-policy.html, but I would suggest
> we postpone one day since January 15th is a Sunday.
>
> I would like to volunteer as the release manager for *Apache Spark
> 3.4.0*.
>
> Thanks,
>
> Xinrong Meng
>
>


Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Wed, Nov 30, 2022 at 8:55 PM Xingbo Jiang  wrote:

> +1
>
> On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim 
> wrote:
>
>> Starting with +1 from me.
>>
>> On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Asynchronous Offset Management in
>>> Structured Streaming.
>>>
>>> The high level summary of the SPIP is that we propose a couple of
>>> improvements on offset management in microbatch execution to lower down
>>> processing latency, which would help for certain types of workloads.
>>>
>>> References:
>>>
>>>- JIRA ticket 
>>>- SPIP doc
>>>
>>> 
>>>- Discussion thread
>>>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks!
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>


Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Mridul Muralidharan
Thanks for all the clarifications and details Jerry, Jungtaek :-)
This looks like an exciting improvement to Structured Streaming - looking
forward to it becoming part of Apache Spark !

Regards,
Mridul


On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng 
wrote:

> Hi all,
>
> I will add my two cents.  Improving the Microbatch execution engine does
> not prevent us from working/improving on the continuous execution engine in
> the future.  These are orthogonal issues.  This new mode I am proposing in
> the microbatch execution engine intends to lower latency of this execution
> engine that most people use today.  We can view it as an incremental
> improvement on the existing engine. I see the continuous execution engine
> as a partially completed re-write of spark streaming and may serve as the
> "future" engine powering Spark Streaming.   Improving the "current" engine
> does not mean we cannot work on a "future" engine.  These two are not
> mutually exclusive. I would like to focus the discussion on the merits of
> this feature in regards to the current micro-batch execution engine and not
> a discussion on the future of continuous execution engine.
>
> Best,
>
> Jerry
>
>
> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim 
> wrote:
>
>> Hi Mridul,
>>
>> I'd like to make clear to avoid any misunderstanding - the decision was
>> not led by me. (I'm just a one of engineers in the team. Not even TL.) As
>> you see the direction, there was an internal consensus to not revisit the
>> continuous mode. There are various reasons, which I think we know already.
>> You seem to remember I have raised concerns about continuous mode, but have
>> you indicated that it was even over 2 years ago? I still see no traction
>> around the project. The main reason I abandoned the discussion was due to
>> promising effort on integrating push based shuffle into continuous mode to
>> achieve shuffle, but no effort has been made so far.
>>
>> The goal of this SPIP is to have an alternative approach dealing with
>> same workload, given that we no longer have confidence of success of
>> continuous mode. But I also want to make clear that deprecating and
>> eventually retiring continuous mode is not a goal of this project. If that
>> happens eventually, that would be a side-effect. Someone may have concerns
>> that we have two different projects aiming for similar thing, but I'd
>> rather see both projects having competition. If anyone willing to improve
>> continuous mode can start making the effort right now. This SPIP does not
>> block it.
>>
>>
>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Hi Jungtaek,
>>>
>>>   Given the goal of the SPIP is reducing latency for stateless apps, and
>>> should reasonably fit continuous mode design goals, it feels odd to not
>>> support it fin the proposal.
>>>
>>> I know you have raised concerns about continuous mode in past as well in
>>> dev@ list, and we are further ignoring it in this proposal (and
>>> possibly other enhancements in past few releases).
>>>
>>> Do you want to revisit the discussion to support it and propose a vote
>>> on that ? And move it to deprecated ?
>>>
>>> I am much more comfortable not supporting this SPIP for CM if it was
>>> deprecated.
>>>
>>> Thoughts ?
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>>
>>>
>>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng 
>>> wrote:
>>>
>>>> Jungtaek,
>>>>
>>>> Thanks for taking up the role to shepard this SPIP!  Thank you for also
>>>> chiming in on your thoughts concerning the continuous mode!
>>>>
>>>> Best,
>>>>
>>>> Jerry
>>>>
>>>> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>
>>>>> Just FYI, I'm shepherding this SPIP project.
>>>>>
>>>>> I think the major meta question would be, "why don't we spend
>>>>> effort on continuous mode rather than initiating another feature aiming 
>>>>> for
>>>>> the same workload?". Jerry already updated the doc to answer the question,
>>>>> but I can also share my thoughts about it.
>>>>>
>>>>> I feel like the current "continuous mode" is a niche solution. (It's
>>>>> not to blame. If you have to deal with such work

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-23 Thread Mridul Muralidharan
Hi Jungtaek,

  Given the goal of the SPIP is reducing latency for stateless apps, and
should reasonably fit continuous mode design goals, it feels odd to not
support it fin the proposal.

I know you have raised concerns about continuous mode in past as well in
dev@ list, and we are further ignoring it in this proposal (and possibly
other enhancements in past few releases).

Do you want to revisit the discussion to support it and propose a vote on
that ? And move it to deprecated ?

I am much more comfortable not supporting this SPIP for CM if it was
deprecated.

Thoughts ?

Regards,
Mridul




On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng 
wrote:

> Jungtaek,
>
> Thanks for taking up the role to shepard this SPIP!  Thank you for also
> chiming in on your thoughts concerning the continuous mode!
>
> Best,
>
> Jerry
>
> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim 
> wrote:
>
>> Just FYI, I'm shepherding this SPIP project.
>>
>> I think the major meta question would be, "why don't we spend effort on
>> continuous mode rather than initiating another feature aiming for the
>> same workload?". Jerry already updated the doc to answer the question, but
>> I can also share my thoughts about it.
>>
>> I feel like the current "continuous mode" is a niche solution. (It's not
>> to blame. If you have to deal with such workload but can't rewrite the
>> underlying engine from scratch, then there are really few options.)
>> Since the implementation went with a workaround to implement which the
>> architecture does not support natively e.g. distributed snapshot, it gets
>> quite tricky on maintaining and expanding the project. It also requires 3rd
>> parties to implement a separate source and sink implementation, which I'm
>> not sure how many 3rd parties actually followed so far.
>>
>> Eventually, "continuous mode" becomes an area no one in the active
>> community knows the details and has willingness to maintain. I wouldn't say
>> we are confident to remove the tag on "experimental", although the feature
>> has been shipped for years. It was introduced in Spark 2.3, surprising
>> enough?
>>
>> We went back and thought about the approach from scratch. Jerry came up
>> with the idea which leverages existing microbatch execution, hence
>> relatively stable and no need to require 3rd parties to support another
>> mode. It adds complexity against microbatch execution but it's a lot less
>> complicated compared to the existing continuous mode. Definitely quite less
>> than creating a new record-to-record engine from scratch.
>>
>> That said, we want to propose and move forward with the new approach.
>>
>> ps. Eventually we could probably discuss retiring continuous mode if the
>> new approach gets accepted and eventually considered as a stable one after
>> several minor releases. That's just me.
>>
>> On Wed, Nov 23, 2022 at 5:16 AM Jerry Peng 
>> wrote:
>>
>>> Hi all,
>>>
>>> I would like to start the discussion for a SPIP, Asynchronous Offset
>>> Management in Structured Streaming.  The high level summary of the SPIP is
>>> that currently in Structured Streaming we perform a couple of offset
>>> management operations for progress tracking purposes synchronously on the
>>> critical path which can contribute significantly to processing latency.  If
>>> we were to make these operations asynchronous and less frequent we can
>>> dramatically improve latency for certain types of workloads.
>>>
>>> I have put together a SPIP to implement such a mechanism.  Please take a
>>> look!
>>>
>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-39591
>>>
>>> SPIP doc:
>>> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing
>>>
>>>
>>> Best,
>>>
>>> Jerry
>>>
>>


Re: [VOTE][RESULT] Release Spark 3.2.3, RC1

2022-11-18 Thread Mridul Muralidharan
This vote result is missing Sean Owen's vote.

- Mridul



On Fri, Nov 18, 2022 at 11:51 AM Chao Sun  wrote:

> The vote passes with 11 +1s (5 binding +1s).
> Thanks to all who helped with the release!
>
> (* = binding)
> +1:
> - Dongjoon Hyun (*)
> - L. C. Hsieh (*)
> - Huaxin Gao (*)
> - Kazuyuki Tanimura
> - Mridul Muralidharan (*)
> - Yuming Wang
> - Chris Nauroth
> - Yang Jie
> - Wenche Fan (*)
> - Ruifeng Zheng
> - Chao Sun
>
> +0: None
>
> -1: None
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Mridul Muralidharan
+1

Would be great to see history server performance improvements and lower
resource utilization at driver !

Regards,
Mridul

On Wed, Nov 16, 2022 at 2:38 AM Kent Yao  wrote:

> +1, non-binding
>
> Gengliang Wang  于2022年11月16日周三 16:36写道:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: "Better Spark UI scalability and
> Driver stability for large applications"
> >
> > The goal of the SPIP is to improve the Driver's stability by supporting
> storing Spark's UI data on RocksDB. Furthermore, to fasten the read and
> write operations on RocksDB, it introduces a new Protobuf serializer.
> >
> > Please also refer to the following:
> >
> > Previous discussion in the dev mailing list: [DISCUSS] SPIP: Better
> Spark UI scalability and Driver stability for large applications
> > Design Doc: Better Spark UI scalability and Driver stability for large
> applications
> > JIRA: SPARK-41053
> >
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Kind Regards,
> > Gengliang
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-15 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Tue, Nov 15, 2022 at 1:00 PM kazuyuki tanimura
 wrote:

> +1 (non-binding)
>
> Thank you Chao
>
> Kazu
>
>
>  | Kazuyuki Tanimura | ktanim...@apple.com | +1-408-207-7176
>
> Apple Confidential and Proprietary Information
>
> This email and any attachments is privileged and contains confidential
> information intended only for the recipient(s) named above. Any
> other distribution, forwarding, copying or disclosure of this message is
> strictly prohibited. If you have received this email in error, please
> notify me immediately by telephone or return email, and delete this message
> from your system.
>
> On Nov 15, 2022, at 10:04 AM, Sean Owen  wrote:
>
> +1 from me, at least from my testing. Java 8 + Scala 2.12 and Java 8 +
> Scala 2.13 worked for me, and I didn't see a test hang. I am testing with
> Python 3.10 FWIW.
>
> On Tue, Nov 15, 2022 at 6:37 AM Yang,Jie(INF)  wrote:
>
>> Hi, all
>>
>>
>>
>> I test v3.2.3 with following command:
>>
>>
>>
>> ```
>>
>> dev/change-scala-version.sh 2.13
>>
>> build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn
>> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
>> -Pscala-2.13 -fn
>>
>> ```
>>
>>
>>
>> The testing environment is:
>>
>>
>>
>> OS: CentOS 6u3 Final
>>
>> Java: zulu 11.0.17
>>
>> Python: 3.9.7
>>
>> Scala: 2.13
>>
>>
>>
>> The above test command has been executed twice, and all times hang in the
>> following stack:
>>
>>
>>
>> ```
>>
>> "ScalaTest-main-running-JoinSuite" #1 prio=5 os_prio=0 cpu=312870.06ms
>> elapsed=1552.65s tid=0x7f2ddc02d000 nid=0x7132 waiting on condition
>> [0x7f2de3929000]
>>
>>java.lang.Thread.State: WAITING (parking)
>>
>>at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
>>
>>- parking to wait for  <0x000790d00050> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>
>>at java.util.concurrent.locks.LockSupport.park(java.base@11.0.17
>> /LockSupport.java:194)
>>
>>at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.17
>> /AbstractQueuedSynchronizer.java:2081)
>>
>>at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.17
>> /LinkedBlockingQueue.java:433)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:275)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$$Lambda$9429/0x000802269840.apply(Unknown
>> Source)
>>
>>at
>> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
>>
>>- locked <0x000790d00208> (a java.lang.Object)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:370)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:355)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan$$Lambda$8573/0x000801f99c40.apply(Unknown
>> Source)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan$$Lambda$8574/0x000801f9a040.apply(Unknown
>> Source)
>>
>>at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
>>
>>at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
>>
>>- locked <0x000790d00218> (a
>> org.apache.spark.sql.execution.QueryExecution)
>>
>>at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
>>
>>at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
>>
>>- locked <0x000790d002d8> (a org.apache.spark.sql.Dataset)
>>
>>at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
>>
>>at
>> org.apache.spark.sql.QueryTest$.$anonfun$getErrorMessageInCheckAnswer$1(QueryTest.scala:265)
>>
>>at
>> org.apache.spark.sql.QueryTest$$$Lambda$8564/0x000801f94440.apply$mcJ$sp(Unknown
>> Source)
>>
>>at
>> scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
>>
>>at
>> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>>
>>at
>> org.apache.spark.sql.QueryTest$.getErrorMessageInCheckAnswer(QueryTest.scala:265)
>>

Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-21 Thread Mridul Muralidharan
My desktop is running Ubuntu 22.04.1 LTS, with JAVA_HOME pointing to
jdk1.8.0_341
I ran build with '-Pyarn -Pmesos -Pkubernetes' profiles [1] and with
$HOME/.m2 cleaned up.

Regards,
Mridul

[1] ARGS="-Pyarn -Pmesos -Pkubernetes"; ./build/mvn $ARGS clean &&
./build/mvn -DskipTests $ARGS package 2>&1 | tee build_output.txt  &&
./build/mvn  $ARGS package 2>&1 | tee test_output.txt

On Fri, Oct 21, 2022 at 11:17 AM Dongjoon Hyun 
wrote:

> Could you provide your environment and test profile? Both community CIs
> look fine to me.
>
> GitHub Action:
> https://github.com/apache/spark/actions?query=branch%3Abranch-3.3
> Apple Silicon Jenkins Farm:
> https://apache-spark.s3.fr-par.scw.cloud/BRANCH-3.3.html
>
> Dongjoon.
>
>
> On Fri, Oct 21, 2022 at 8:48 AM Mridul Muralidharan 
> wrote:
>
>> Hi,
>>
>>   I saw a couple of test failures I have not observed before:
>>
>> a) FsHistoryProviderSuite -  "SPARK-33146: don't let one bad rolling log
>> folder prevent loading other applications"
>> b) MesosClusterSchedulerSuite - "accept/decline offers with driver
>> constraints"
>>
>> I ended up 'ignore''ing them to make the build pass, but did anything
>> change to cause them to fail/be flakey ?
>>
>> Rest of the validation and build went fine.
>>
>> Regards,
>> Mridul
>>
>>
>>
>>
>>
>> On Tue, Oct 18, 2022 at 10:28 PM Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> - Passed Apache Kyuubi (Incubating) integration tests[1]
>>> - Run some jobs on our internal K8s cluster
>>>
>>> [1] https://github.com/apache/incubator-kyuubi/pull/3507
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>> On Wed, Oct 19, 2022 at 9:13 AM Yikun Jiang  wrote:
>>> >
>>> > +1, also test passed with spark-docker workflow (downloading rc4 tgz,
>>> extract, build image, run K8s IT)
>>> >
>>> > [1] https://github.com/Yikun/spark-docker/pull/9
>>> >
>>> > Regards,
>>> > Yikun
>>> >
>>> > On Wed, Oct 19, 2022 at 8:59 AM Wenchen Fan 
>>> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> On Wed, Oct 19, 2022 at 4:59 AM Chao Sun  wrote:
>>> >>>
>>> >>> +1. Thanks Yuming!
>>> >>>
>>> >>> Chao
>>> >>>
>>> >>> On Tue, Oct 18, 2022 at 1:18 PM Thomas graves 
>>> wrote:
>>> >>> >
>>> >>> > +1. Ran internal test suite.
>>> >>> >
>>> >>> > Tom
>>> >>> >
>>> >>> > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang 
>>> wrote:
>>> >>> > >
>>> >>> > > Please vote on releasing the following candidate as Apache Spark
>>> version 3.3.1.
>>> >>> > >
>>> >>> > > The vote is open until 11:59pm Pacific time October 21th and
>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >>> > >
>>> >>> > > [ ] +1 Release this package as Apache Spark 3.3.1
>>> >>> > > [ ] -1 Do not release this package because ...
>>> >>> > >
>>> >>> > > To learn more about Apache Spark, please see
>>> https://spark.apache.org
>>> >>> > >
>>> >>> > > The tag to be voted on is v3.3.1-rc4 (commit
>>> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
>>> >>> > > https://github.com/apache/spark/tree/v3.3.1-rc4
>>> >>> > >
>>> >>> > > The release files, including signatures, digests, etc. can be
>>> found at:
>>> >>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
>>> >>> > >
>>> >>> > > Signatures used for Spark RCs can be found in this file:
>>> >>> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >>> > >
>>> >>> > > The staging repository for this release can be found at:
>>> >>> > >
>>> https://repository.apache.org/content/repositories/orgapachespark-1430
>>> >>> > >
>>> >>> > > The documentation corresponding to this release can be found at:
>>> >>> > > https://dis

Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-21 Thread Mridul Muralidharan
Hi,

  I saw a couple of test failures I have not observed before:

a) FsHistoryProviderSuite -  "SPARK-33146: don't let one bad rolling log
folder prevent loading other applications"
b) MesosClusterSchedulerSuite - "accept/decline offers with driver
constraints"

I ended up 'ignore''ing them to make the build pass, but did anything
change to cause them to fail/be flakey ?

Rest of the validation and build went fine.

Regards,
Mridul





On Tue, Oct 18, 2022 at 10:28 PM Cheng Pan  wrote:

> +1 (non-binding)
>
> - Passed Apache Kyuubi (Incubating) integration tests[1]
> - Run some jobs on our internal K8s cluster
>
> [1] https://github.com/apache/incubator-kyuubi/pull/3507
>
> Thanks,
> Cheng Pan
>
> On Wed, Oct 19, 2022 at 9:13 AM Yikun Jiang  wrote:
> >
> > +1, also test passed with spark-docker workflow (downloading rc4 tgz,
> extract, build image, run K8s IT)
> >
> > [1] https://github.com/Yikun/spark-docker/pull/9
> >
> > Regards,
> > Yikun
> >
> > On Wed, Oct 19, 2022 at 8:59 AM Wenchen Fan  wrote:
> >>
> >> +1
> >>
> >> On Wed, Oct 19, 2022 at 4:59 AM Chao Sun  wrote:
> >>>
> >>> +1. Thanks Yuming!
> >>>
> >>> Chao
> >>>
> >>> On Tue, Oct 18, 2022 at 1:18 PM Thomas graves 
> wrote:
> >>> >
> >>> > +1. Ran internal test suite.
> >>> >
> >>> > Tom
> >>> >
> >>> > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang 
> wrote:
> >>> > >
> >>> > > Please vote on releasing the following candidate as Apache Spark
> version 3.3.1.
> >>> > >
> >>> > > The vote is open until 11:59pm Pacific time October 21th and
> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>> > >
> >>> > > [ ] +1 Release this package as Apache Spark 3.3.1
> >>> > > [ ] -1 Do not release this package because ...
> >>> > >
> >>> > > To learn more about Apache Spark, please see
> https://spark.apache.org
> >>> > >
> >>> > > The tag to be voted on is v3.3.1-rc4 (commit
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> >>> > > https://github.com/apache/spark/tree/v3.3.1-rc4
> >>> > >
> >>> > > The release files, including signatures, digests, etc. can be
> found at:
> >>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
> >>> > >
> >>> > > Signatures used for Spark RCs can be found in this file:
> >>> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>> > >
> >>> > > The staging repository for this release can be found at:
> >>> > >
> https://repository.apache.org/content/repositories/orgapachespark-1430
> >>> > >
> >>> > > The documentation corresponding to this release can be found at:
> >>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
> >>> > >
> >>> > > The list of bug fixes going into 3.3.1 can be found at the
> following URL:
> >>> > > https://s.apache.org/ttgz6
> >>> > >
> >>> > > This release is using the release script of the tag v3.3.1-rc4.
> >>> > >
> >>> > >
> >>> > > FAQ
> >>> > >
> >>> > > ==
> >>> > > What happened to v3.3.1-rc3?
> >>> > > ==
> >>> > > A performance regression(SPARK-40703) was found after tagging
> v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
> >>> > > So we skipped the vote on v3.3.1-rc3.
> >>> > >
> >>> > > =
> >>> > > How can I help test this release?
> >>> > > =
> >>> > > If you are a Spark user, you can help us test this release by
> taking
> >>> > > an existing Spark workload and running on this release candidate,
> then
> >>> > > reporting any regressions.
> >>> > >
> >>> > > If you're working in PySpark you can set up a virtual env and
> install
> >>> > > the current RC and see if anything important breaks, in the
> Java/Scala
> >>> > > you can add the staging repository to your projects resolvers and
> test
> >>> > > with the RC (make sure to clean up the artifact cache before/after
> so
> >>> > > you don't end up building with a out of date RC going forward).
> >>> > >
> >>> > > ===
> >>> > > What should happen to JIRA tickets still targeting 3.3.1?
> >>> > > ===
> >>> > > The current list of open tickets targeted at 3.3.1 can be found at:
> >>> > > https://issues.apache.org/jira/projects/SPARK and search for
> "Target Version/s" = 3.3.1
> >>> > >
> >>> > > Committers should look at those and triage. Extremely important bug
> >>> > > fixes, documentation, and API tweaks that impact compatibility
> should
> >>> > > be worked on immediately. Everything else please retarget to an
> >>> > > appropriate release.
> >>> > >
> >>> > > ==
> >>> > > But my bug isn't fixed?
> >>> > > ==
> >>> > > In order to make timely releases, we will typically not hold the
> >>> > > release unless the bug in question is a regression from the
> previous
> >>> > > release. That being said, if there is something which is a
> regression
> >>> > > that has not been correctly targeted please ping me or a committer
> to
> >>> > > help target the issue.
> >>> > 

Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Mridul Muralidharan
Congratulations !

Regards,
Mridul

On Sat, Oct 8, 2022 at 12:19 AM Yuming Wang  wrote:

> Congratulations Yikun!
>
> On Sat, Oct 8, 2022 at 12:40 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> The Spark PMC recently added Yikun Jiang as a committer on the project.
>> Yikun is the major contributor of the infrastructure and GitHub Actions
>> in Apache Spark as well as Kubernates and PySpark.
>> He has put a lot of effort into stabilizing and optimizing the builds
>> so we all can work together in Apache Spark more
>> efficiently and effectively. He's also driving the SPIP for Docker
>> official image in Apache Spark as well for users and developers.
>> Please join me in welcoming Yikun!
>>
>>


Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Mridul Muralidharan
+1 from me, with a few comments.

I saw the following failures, are these known issues/flakey tests ?

* PersistenceEngineSuite.ZooKeeperPersistenceEngine
Looks like a port conflict issue from a quick look into logs (conflict with
starting admin port at 8080) - is this expected behavior for the test ?
I worked around it by shutting down the process which was using the port -
though did not investigate deeply.

* org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite was aborted
It is expecting these artifacts in $HOME/.m2/repository

1. tomcat#jasper-compiler;5.5.23!jasper-compiler.jar
2. tomcat#jasper-runtime;5.5.23!jasper-runtime.jar
3. commons-el#commons-el;1.0!commons-el.jar
4. org.apache.hive#hive-exec;2.3.7!hive-exec.jar

I worked around it by adding them locally explicitly - we should probably
add them as test dependency ?
Not sure if this changed in this release though (I had cleaned my local .m2
recently)

Other than this, rest looks good to me.

Regards,
Mridul


On Wed, Sep 28, 2022 at 2:56 PM Sean Owen  wrote:

> +1 from me, same result as last RC.
>
> On Wed, Sep 28, 2022 at 12:21 AM Yuming Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.3.1.
>>
>> The vote is open until 11:59pm Pacific time October 3th and passes if a 
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org
>>
>> The tag to be voted on is v3.3.1-rc2 (commit 
>> 1d3b8f7cb15283a1e37ecada6d751e17f30647ce):
>> https://github.com/apache/spark/tree/v3.3.1-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-bin
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1421
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-docs
>>
>> The list of bug fixes going into 3.3.1 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12351710
>>
>> This release is using the release script of the tag v3.3.1-rc2.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.1?
>> ===
>> The current list of open tickets targeted at 3.3.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 3.3.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>>
>>


Re: How to set platform-level defaults for array-like configs?

2022-08-11 Thread Mridul Muralidharan
Hi,

  Wenchen, would be great if you could chime in with your thoughts - given
the feedback you originally had on the PR.
It would be great to hear feedback from others on this, particularly folks
managing spark deployments - how this is mitigated/avoided in your
case, any other pain points with configs in this context.


Regards,
Mridul

On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen  wrote:

> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>- Define a clear convention, e.g. a suffix of ".default" that enables
>a default to be set and merged
>- Document this convention in configuration.md so that we can avoid
>separately documenting each default-config, and instead just add a note in
>the docs for the normal config.
>- Adjust the withPrepended method
>
> 
>added in #24804  to
>leverage this convention instead of each usage instance re-defining the
>additional config name
>- Do a comprehensive review of applicable configs and enable them all
>to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 , would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmaha...@gmail.com> wrote:
>
>> Hi Spark devs,
>>
>> Spark contains a bunch of array-like configs (comma separated lists).
>> Some examples include `spark.sql.extensions`,
>> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
>> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
>> a dozen or so more). As owners of the Spark platform in our organization,
>> we would like to set platform-level defaults, e.g. custom SQL extension and
>> listeners, and we use some of the above mentioned properties to do so. At
>> the same time, we have power users writing their own listeners, setting the
>> same Spark confs and thus unintentionally overriding our platform defaults.
>> This leads to a loss of functionality within our platform.
>>
>> Previously, Spark has introduced "default" confs for a few of these
>> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
>> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
>> These properties are meant to only be set by cluster admins thus allowing
>> separation between platform default and user configs. However, as discussed
>> in https://github.com/apache/spark/pull/34856, these configs are still
>> client-side and can still be overridden, while also not being a scalable
>> solution as we cannot introduce 1 new "default" config for every array-like
>> config.
>>
>> I wanted to know if others have experienced this issue and what systems
>> have been implemented to tackle this. Are there any existing solutions for
>> this; either client-side or server-side? (e.g. at job submission server).
>> Even though we cannot easily enforce this at the client-side, the
>> simplicity of a solution may make it more appealing.
>>
>> Thanks,
>> Shardul
>>
>


Re: Welcoming three new PMC members

2022-08-09 Thread Mridul Muralidharan
Congratulations !
Great to have you join the PMC !!

Regards,
Mridul

On Tue, Aug 9, 2022 at 11:57 AM vaquar khan  wrote:

> Congratulations
>
> On Tue, Aug 9, 2022, 11:40 AM Xiao Li  wrote:
>
>> Hi all,
>>
>> The Spark PMC recently voted to add three new PMC members. Join me in
>> welcoming them to their new roles!
>>
>> New PMC members: Huaxin Gao, Gengliang Wang and Maxim Gekk
>>
>> The Spark PMC
>>
>


Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Mridul Muralidharan
Congratulations Xinrong !

Regards,
Mridul

On Tue, Aug 9, 2022 at 3:13 AM Hyukjin Kwon  wrote:

> Hi all,
>
> The Spark PMC recently added Xinrong Meng as a committer on the project.
> Xinrong is the major contributor of PySpark especially Pandas API on Spark.
> She has guided a lot of new contributors enthusiastically. Please join me
> in welcoming Xinrong!
>
>


Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-12 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with "-Pyarn -Pmesos -Pkubernetes"

As always, the test "SPARK-33084: Add jar support Ivy URI in SQL" in
sql.SQLQuerySuite fails in my env; but other than that, the rest looks good.

Regards,
Mridul


On Tue, Jul 12, 2022 at 3:17 AM Maxim Gekk
 wrote:

> +1
>
> On Tue, Jul 12, 2022 at 11:05 AM Yang,Jie(INF) 
> wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Yang Jie
>>
>>
>>
>>
>>
>> *发件人**: *Dongjoon Hyun 
>> *日期**: *2022年7月12日 星期二 16:03
>> *收件人**: *dev 
>> *抄送**: *Cheng Su , "Yang,Jie(INF)" <
>> yangji...@baidu.com>, Sean Owen 
>> *主题**: *Re: [VOTE] Release Spark 3.2.2 (RC1)
>>
>>
>>
>> +1
>>
>>
>>
>> Dongjoon.
>>
>>
>>
>> On Mon, Jul 11, 2022 at 11:34 PM Cheng Su  wrote:
>>
>> +1 (non-binding). Built from source, and ran some scala unit tests on M1
>> mac, with OpenJDK 8 and Scala 2.12.
>>
>>
>>
>> Thanks,
>>
>> Cheng Su
>>
>>
>>
>> On Mon, Jul 11, 2022 at 10:31 PM Yang,Jie(INF) 
>> wrote:
>>
>> Does this happen when running all UTs? I ran this suite several times
>> alone using OpenJDK(zulu) 8u322-b06 on my Mac, but no similar error
>> occurred
>>
>>
>>
>> *发件人**: *Sean Owen 
>> *日期**: *2022年7月12日 星期二 10:45
>> *收件人**: *Dongjoon Hyun 
>> *抄送**: *dev 
>> *主题**: *Re: [VOTE] Release Spark 3.2.2 (RC1)
>>
>>
>>
>> Is anyone seeing this error? I'm on OpenJDK 8 on a Mac:
>>
>>
>>
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x000101ca8ace, pid=11962,
>> tid=0x1603
>> #
>> # JRE version: OpenJDK Runtime Environment (8.0_322) (build
>> 1.8.0_322-bre_2022_02_28_15_01-b00)
>> # Java VM: OpenJDK 64-Bit Server VM (25.322-b00 mixed mode bsd-amd64
>> compressed oops)
>> # Problematic frame:
>> # V  [libjvm.dylib+0x549ace]
>> #
>> # Failed to write core dump. Core dumps have been disabled. To enable
>> core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /private/tmp/spark-3.2.2/sql/core/hs_err_pid11962.log
>> ColumnVectorSuite:
>> - boolean
>> - byte
>> Compiled method (nm)  885897 75403 n 0
>> sun.misc.Unsafe::putShort (native)
>>  total in heap  [0x000102fdaa10,0x000102fdad48] = 824
>>  relocation [0x000102fdab38,0x000102fdab78] = 64
>>  main code  [0x000102fdab80,0x000102fdad48] = 456
>> Compiled method (nm)  885897 75403 n 0
>> sun.misc.Unsafe::putShort (native)
>>  total in heap  [0x000102fdaa10,0x000102fdad48] = 824
>>  relocation [0x000102fdab38,0x000102fdab78] = 64
>>  main code  [0x000102fdab80,0x000102fdad48] = 456
>>
>>
>>
>> On Mon, Jul 11, 2022 at 4:58 PM Dongjoon Hyun 
>> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.2.
>>
>> The vote is open until July 15th 1AM (PST) and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>> 
>>
>> The tag to be voted on is v3.2.2-rc1 (commit
>> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
>> https://github.com/apache/spark/tree/v3.2.2-rc1
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
>> 
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1409/
>> 
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
>> 
>>
>> The list of bug fixes going into 3.2.2 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12351232
>> 
>>
>> This release is using the release script of the tag v3.2.2-rc1.
>>
>> FAQ
>>
>> =
>> How 

Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Mridul Muralidharan
+1

Thanks for driving this Dongjoon !

Regards,
Mridul

On Thu, Jul 7, 2022 at 12:36 AM Gengliang Wang  wrote:

> +1.
> Thank you, Dongjoon.
>
> On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng
>>  wrote:
>>
>>> +1
>>>
>>> Thanks!
>>>
>>>
>>> Xinrong Meng
>>>
>>> Software Engineer
>>>
>>> Databricks
>>>
>>>
>>> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:
>>>
 +1

 Xiao

 Cheng Su  于2022年7月6日周三 19:16写道:

> +1 (non-binding)
>
> Thanks,
> Cheng Su
>
> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>>  wrote:
>>
>>> +1
>>>
>>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge 
>>> wrote:
>>>
 +1  Thanks for the effort!

 On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
 bjornjorgen...@gmail.com> wrote:

> +1
>
> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>
>> Yeah +1
>>
>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>>
>>> Hi, All.
>>>
>>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>>> including 11 correctness patches arrived at branch-3.2.
>>>
>>> Shall we make a new release, Apache Spark 3.2.2, as the third
>>> release
>>> at 3.2 line? I'd like to volunteer as the release manager for
>>> Apache
>>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>>
>>> $ git log --oneline v3.2.1..HEAD | wc -l
>>>  197
>>>
>>> # Correctness issues
>>>
>>> SPARK-38075 Hive script transform with order by and limit
>>> will
>>> return fake rows
>>> SPARK-38204 All state operators are at a risk of
>>> inconsistency
>>> between state partitioning and operator partitioning
>>> SPARK-38309 SHS has incorrect percentiles for shuffle read
>>> bytes
>>> and shuffle total blocks metrics
>>> SPARK-38320 (flat)MapGroupsWithState can timeout groups
>>> which just
>>> received inputs in the same microbatch
>>> SPARK-38614 After Spark update, df.show() shows incorrect
>>> F.percent_rank results
>>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the
>>> offset
>>> row whose input is not null
>>> SPARK-38684 Stream-stream outer join has a possible
>>> correctness
>>> issue due to weakly read consistent on outer iterators
>>> SPARK-39061 Incorrect results or NPE when using Inline
>>> function
>>> against an array of dynamically created structs
>>> SPARK-39107 Silent change in regexp_replace's handling of
>>> empty strings
>>> SPARK-39259 Timestamps returned by now() and equivalent
>>> functions
>>> are not consistent in subqueries
>>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>>> intermediate result if string, struct, array, or map
>>>
>>> Best,
>>> Dongjoon.
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
 John Zhuge

>>>


Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

The test "SPARK-33084: Add jar support Ivy URI in SQL" in sql.SQLQuerySuite
fails; but other than that, rest looks good.

Regards,
Mridul



On Mon, Jun 13, 2022 at 4:25 PM Tom Graves 
wrote:

> +1
>
> Tom
>
> On Thursday, June 9, 2022, 11:27:50 PM CDT, Maxim Gekk
>  wrote:
>
>
> Please vote on releasing the following candidate as
> Apache Spark version 3.3.0.
>
> The vote is open until 11:59pm Pacific time June 14th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc6 (commit
> f74867bddfbcdd4d08076db36851e88b15e66556):
> https://github.com/apache/spark/tree/v3.3.0-rc6
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1407
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc6.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>


Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-06 Thread Mridul Muralidharan
I will also try to get a PR out to fix the first test failure that Sean
reported. I will have a PR ready by EOD.

Regards,
Mridul


On Fri, May 6, 2022 at 10:31 AM Gengliang Wang  wrote:

> Hi Maxim,
>
> Thanks for the work!
> There is a bug fix from Bruce merged on branch-3.3 right after the RC1 is
> cut:
> SPARK-39093: Dividing interval by integral can result in codegen
> compilation error
> 
>
> So -1 from me. We should have RC2 to include the fix.
>
> Thanks
> Gengliang
>
> On Fri, May 6, 2022 at 6:15 PM Maxim Gekk
>  wrote:
>
>> Hi Dongjoon,
>>
>>  > https://issues.apache.org/jira/projects/SPARK/versions/12350369
>> > Since RC1 is started, could you move them out from the 3.3.0 milestone?
>>
>> I have removed the 3.3.0 label from Fix version(s). Thank you, Dongjoon.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Fri, May 6, 2022 at 11:06 AM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, Sean.
>>> It's interesting. I didn't see those failures from my side.
>>>
>>> Hi, Maxim.
>>> In the following link, there are 17 in-progress and 6 to-do JIRA issues
>>> which look irrelevant to this RC1 vote.
>>>
>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>
>>> Since RC1 is started, could you move them out from the 3.3.0 milestone?
>>> Otherwise, we cannot distinguish new real blocker issues from those
>>> obsolete JIRA issues.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Thu, May 5, 2022 at 11:46 AM Adam Binford  wrote:
>>>
 I looked back at the first one (SPARK-37618), it expects/assumes a 0022
 umask to correctly test the behavior. I'm not sure how to get that to not
 fail or be ignored with a more open umask.

 On Thu, May 5, 2022 at 1:56 PM Sean Owen  wrote:

> I'm seeing test failures; is anyone seeing ones like this? This is
> Java 8 / Scala 2.12 / Ubuntu 22.04:
>
> - SPARK-37618: Sub dirs are group writable when removing from shuffle
> service enabled *** FAILED ***
>   [OWNER_WRITE, GROUP_READ, GROUP_WRITE, GROUP_EXECUTE, OTHERS_READ,
> OWNER_READ, OTHERS_EXECUTE, OWNER_EXECUTE] contained GROUP_WRITE
> (DiskBlockManagerSuite.scala:155)
>
> - Check schemas for expression examples *** FAILED ***
>   396 did not equal 398 Expected 396 blocks in result file but got
> 398. Try regenerating the result files. (ExpressionsSchemaSuite.scala:161)
>
>  Function 'bloom_filter_agg', Expression class
> 'org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate'
> "" did not start with "
>   Examples:
>   " (ExpressionInfoSuite.scala:142)
>
> On Thu, May 5, 2022 at 6:01 AM Maxim Gekk
>  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>>  version 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time May 10th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc1 (commit
>> 482b7d54b522c4d1e25f3e84eabbc78126f22a3d):
>> https://github.com/apache/spark/tree/v3.3.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1402
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following
>> URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate,
>> then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> 

Re: Apache Spark 3.3 Release

2022-03-03 Thread Mridul Muralidharan
Agree with Sean, code freeze by mid March sounds good.

Regards,
Mridul

On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:

> I think it's fine to pursue the existing plan - code freeze in two weeks
> and try to close off key remaining issues. Final release pending on how
> those go, and testing, but fine to get the ball rolling.
>
> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>  wrote:
>
>> Hello All,
>>
>> I would like to bring on the table the theme about the new Spark release
>> 3.3. According to the public schedule at
>> https://spark.apache.org/versioning-policy.html, we planned to start the
>> code freeze and release branch cut on March 15th, 2022. Since this date is
>> coming soon, I would like to take your attention on the topic and gather
>> objections that you might have.
>>
>> Bellow is the list of ongoing and active SPIPs:
>>
>> Spark SQL:
>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>> - [SPARK-35801] Row-level operations in Data Source V2
>> - [SPARK-37166] Storage Partitioned Join
>>
>> Spark Core:
>> - [SPARK-20624] Add better handling for node shutdown
>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>
>> PySpark:
>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>
>> Kubernetes:
>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>
>> Probably, we should finish if there are any remaining works for Spark
>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>> would like to volunteer to help drive this process.
>>
>> Best regards,
>> Max Gekk
>>
>


Re: [VOTE] Spark 3.1.3 RC4

2022-02-16 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Wed, Feb 16, 2022 at 8:32 AM Thomas graves  wrote:

> +1
>
> Tom
>
> On Mon, Feb 14, 2022 at 2:55 PM Holden Karau  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.1.3.
> >
> > The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes
> if a majority
> > +1 PMC votes are cast, with a minimum of 3 + 1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.1.3
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > There are currently no open issues targeting 3.1.3 in Spark's JIRA
> https://issues.apache.org/jira/browse
> > (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in
> (Open, Reopened, "In Progress"))
> > at https://s.apache.org/n79dw
> >
> >
> >
> > The tag to be voted on is v3.1.3-rc4 (commit
> > d1f8a503a26bcfb4e466d9accc5fa241a7933667):
> > https://github.com/apache/spark/tree/v3.1.3-rc4
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at
> > https://repository.apache.org/content/repositories/orgapachespark-1401
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-docs/
> >
> > The list of bug fixes going into 3.1.3 can be found at the following URL:
> > https://s.apache.org/x0q9b
> >
> > This release is using the release script from 3.1.3
> > The release docker container was rebuilt since the previous version
> didn't have the necessary components to build the R documentation.
> >
> > FAQ
> >
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.1.3?
> > ===
> >
> > The current list of open tickets targeted at 3.1.3 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > Version/s" = 3.1.3
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something that is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> > Note: I added an extra day to the vote since I know some folks are
> likely busy on the 14th with partner(s).
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 3.1.3 RC3

2022-02-02 Thread Mridul Muralidharan
Hi,

  Minor nit: the tag mentioned under [1] looks like a typo - I used
"v3.1.3-rc3"  for my vote (3.2.1 is mentioned in a couple of places, treat
them as 3.1.3 instead)

+1
Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

[1] "The tag to be voted on is v3.2.1-rc1" - the commit hash and git url
are correct.


On Wed, Feb 2, 2022 at 9:30 AM Mridul Muralidharan  wrote:

>
> Thanks Tom !
> I missed [1] (or probably forgot) the 3.1 part of the discussion given it
> centered around 3.2 ...
>
>
> Regards,
> Mridul
>
> [1] https://www.mail-archive.com/dev@spark.apache.org/msg28484.html
>
> On Wed, Feb 2, 2022 at 8:55 AM Thomas Graves  wrote:
>
>> It was discussed doing all the maintenance lines back at beginning of
>> December (Dec 6) when we were talking about release 3.2.1.
>>
>> Tom
>>
>> On Wed, Feb 2, 2022 at 2:07 AM Mridul Muralidharan 
>> wrote:
>> >
>> > Hi Holden,
>> >
>> >   Not that I am against releasing 3.1.3 (given the fixes that have
>> already gone in), but did we discuss releasing it ? I might have missed the
>> thread ...
>> >
>> > Regards,
>> > Mridul
>> >
>> > On Tue, Feb 1, 2022 at 7:12 PM Holden Karau 
>> wrote:
>> >>
>> >> Please vote on releasing the following candidate as Apache Spark
>> version 3.1.3.
>> >>
>> >> The vote is open until Feb. 4th at 5 PM PST (1 AM UTC + 1 day) and
>> passes if a majority
>> >> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>> >>
>> >> [ ] +1 Release this package as Apache Spark 3.1.3
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> To learn more about Apache Spark, please see http://spark.apache.org/
>> >>
>> >> There are currently no open issues targeting 3.1.3 in Spark's JIRA
>> https://issues.apache.org/jira/browse
>> >> (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in
>> (Open, Reopened, "In Progress"))
>> >> at https://s.apache.org/n79dw
>> >>
>> >>
>> >>
>> >> The tag to be voted on is v3.2.1-rc1 (commit
>> >> b8c0799a8cef22c56132d94033759c9f82b0cc86):
>> >> https://github.com/apache/spark/tree/v3.1.3-rc3
>> >>
>> >> The release files, including signatures, digests, etc. can be found at:
>> >> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-bin/
>> >>
>> >> Signatures used for Spark RCs can be found in this file:
>> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>
>> >> The staging repository for this release can be found at
>> >> :
>> https://repository.apache.org/content/repositories/orgapachespark-1400/
>> >>
>> >> The documentation corresponding to this release can be found at:
>> >> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-docs/
>> >>
>> >> The list of bug fixes going into 3.1.3 can be found at the following
>> URL:
>> >> https://s.apache.org/x0q9b
>> >>
>> >> This release is using the release script in master as of
>> ddc77fb906cb3ce1567d277c2d0850104c89ac25
>> >> The release docker container was rebuilt since the previous version
>> didn't have the necessary components to build the R documentation.
>> >>
>> >> FAQ
>> >>
>> >>
>> >> =
>> >> How can I help test this release?
>> >> =
>> >>
>> >> If you are a Spark user, you can help us test this release by taking
>> >> an existing Spark workload and running on this release candidate, then
>> >> reporting any regressions.
>> >>
>> >> If you're working in PySpark you can set up a virtual env and install
>> >> the current RC and see if anything important breaks, in the Java/Scala
>> >> you can add the staging repository to your projects resolvers and test
>> >> with the RC (make sure to clean up the artifact cache before/after so
>> >> you don't end up building with an out of date RC going forward).
>> >>
>> >> ===
>> >> What should happen to JIRA tickets still targeting 3.1.3?
>> >> ===
>> >>
>> >> The current list of open tickets t

Re: [VOTE] Spark 3.1.3 RC3

2022-02-02 Thread Mridul Muralidharan
Thanks Tom !
I missed [1] (or probably forgot) the 3.1 part of the discussion given it
centered around 3.2 ...


Regards,
Mridul

[1] https://www.mail-archive.com/dev@spark.apache.org/msg28484.html

On Wed, Feb 2, 2022 at 8:55 AM Thomas Graves  wrote:

> It was discussed doing all the maintenance lines back at beginning of
> December (Dec 6) when we were talking about release 3.2.1.
>
> Tom
>
> On Wed, Feb 2, 2022 at 2:07 AM Mridul Muralidharan 
> wrote:
> >
> > Hi Holden,
> >
> >   Not that I am against releasing 3.1.3 (given the fixes that have
> already gone in), but did we discuss releasing it ? I might have missed the
> thread ...
> >
> > Regards,
> > Mridul
> >
> > On Tue, Feb 1, 2022 at 7:12 PM Holden Karau 
> wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.1.3.
> >>
> >> The vote is open until Feb. 4th at 5 PM PST (1 AM UTC + 1 day) and
> passes if a majority
> >> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.1.3
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see http://spark.apache.org/
> >>
> >> There are currently no open issues targeting 3.1.3 in Spark's JIRA
> https://issues.apache.org/jira/browse
> >> (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in
> (Open, Reopened, "In Progress"))
> >> at https://s.apache.org/n79dw
> >>
> >>
> >>
> >> The tag to be voted on is v3.2.1-rc1 (commit
> >> b8c0799a8cef22c56132d94033759c9f82b0cc86):
> >> https://github.com/apache/spark/tree/v3.1.3-rc3
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-bin/
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at
> >> :
> https://repository.apache.org/content/repositories/orgapachespark-1400/
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-docs/
> >>
> >> The list of bug fixes going into 3.1.3 can be found at the following
> URL:
> >> https://s.apache.org/x0q9b
> >>
> >> This release is using the release script in master as of
> ddc77fb906cb3ce1567d277c2d0850104c89ac25
> >> The release docker container was rebuilt since the previous version
> didn't have the necessary components to build the R documentation.
> >>
> >> FAQ
> >>
> >>
> >> =
> >> How can I help test this release?
> >> =
> >>
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC and see if anything important breaks, in the Java/Scala
> >> you can add the staging repository to your projects resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with an out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.1.3?
> >> ===
> >>
> >> The current list of open tickets targeted at 3.2.1 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> >> Version/s" = 3.1.3
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >>
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something that is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >> ==
> >> What happened to RC1 & RC2?
> >> ==
> >>
> >> When I first went to build RC1 the build process failed due to the
> >> lack of the R markdown package in my local rm container. By the time
> >> I had time to debug and rebuild there was already another bug fix
> commit in
> >> branch-3.1 so I decided to skip ahead to RC2 and pick it up directly.
> >> When I went to go send the RC2 vote e-mail I noticed a correctness
> issue had
> >> been fixed in branch-3.1 so I rolled RC3 to contain the correctness fix.
> >>
> >> --
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] Spark 3.1.3 RC3

2022-02-02 Thread Mridul Muralidharan
Hi Holden,

  Not that I am against releasing 3.1.3 (given the fixes that have already
gone in), but did we discuss releasing it ? I might have missed the thread
...

Regards,
Mridul

On Tue, Feb 1, 2022 at 7:12 PM Holden Karau  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.3.
>
> The vote is open until Feb. 4th at 5 PM PST (1 AM UTC + 1 day) and passes
> if a majority
> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no open issues targeting 3.1.3 in Spark's JIRA
> https://issues.apache.org/jira/browse
> (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in (Open,
> Reopened, "In Progress"))
> at https://s.apache.org/n79dw
>
>
>
> The tag to be voted on is v3.2.1-rc1 (commit
> b8c0799a8cef22c56132d94033759c9f82b0cc86):
> https://github.com/apache/spark/tree/v3.1.3-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at
> :https://repository.apache.org/content/repositories/orgapachespark-1400/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc3-docs/
>
> The list of bug fixes going into 3.1.3 can be found at the following URL:
> https://s.apache.org/x0q9b
>
> This release is using the release script in master as
> of ddc77fb906cb3ce1567d277c2d0850104c89ac25
> The release docker container was rebuilt since the previous version didn't
> have the necessary components to build the R documentation.
>
> FAQ
>
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.3?
> ===
>
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something that is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> ==
> What happened to RC1 & RC2?
> ==
>
> When I first went to build RC1 the build process failed due to the
> lack of the R markdown package in my local rm container. By the time
> I had time to debug and rebuild there was already another bug fix commit in
> branch-3.1 so I decided to skip ahead to RC2 and pick it up directly.
> When I went to go send the RC2 vote e-mail I noticed a correctness issue
> had
> been fixed in branch-3.1 so I rolled RC3 to contain the correctness fix.
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-22 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:

> +1 with same result as last time.
>
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy
>>
>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>> = How can I help test this release?
>> = If you are a Spark user, you can help us test
>> this release by taking an existing Spark workload and running on this
>> release candidate, then reporting any regressions. If you're working in
>> PySpark you can set up a virtual env and install the current RC and see if
>> anything important breaks, in the Java/Scala you can add the staging
>> repository to your projects resolvers and test with the RC (make sure to
>> clean up the artifact cache before/after so you don't end up building with
>> a out of date RC going forward).
>> === What should happen to JIRA
>> tickets still targeting 3.2.1? ===
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. == But my bug isn't
>> fixed? == In order to make timely releases, we will
>> typically not hold the release unless the bug in question is a regression
>> from the previous release. That being said, if there is something which is
>> a regression that has not been correctly targeted please ping me or a
>> committer to help target the issue.
>>
>


Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-12 Thread Mridul Muralidharan
+1 (binding)
This should be a great improvement !

Regards,
Mridul

On Wed, Jan 12, 2022 at 4:04 AM Kent Yao  wrote:

> +1 (non-binding)
>
> Thomas Graves  于2022年1月12日周三 11:52写道:
>
>> +1 (binding).
>>
>> One minor note since I haven't had time to look at the implementation
>> details is please make sure resource aware scheduling and the stage
>> level scheduling still work or any caveats are documented. Feel free
>> to ping me if questions in these areas.
>>
>> Tom
>>
>> On Wed, Jan 5, 2022 at 7:07 PM Yikun Jiang  wrote:
>> >
>> > Hi all,
>> >
>> > I’d like to start a vote for SPIP: "Support Customized Kubernetes
>> Schedulers Proposal"
>> >
>> > The SPIP is to support customized Kubernetes schedulers in Spark on
>> Kubernetes.
>> >
>> > Please also refer to:
>> >
>> > - Previous discussion in dev mailing list: [DISCUSSION] SPIP: Support
>> Volcano/Alternative Schedulers Proposal
>> > - Design doc: [SPIP] Spark-36057 Support Customized Kubernetes
>> Schedulers Proposal
>> > - JIRA: SPARK-36057
>> >
>> > Please vote on the SPIP:
>> >
>> > [ ] +1: Accept the proposal as an official SPIP
>> > [ ] +0
>> > [ ] -1: I don’t think this is a good idea because …
>> >
>> > Regards,
>> > Yikun
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Time for Spark 3.2.1?

2021-12-07 Thread Mridul Muralidharan
+1 for maintenance release, and also +1 for doing this in Jan !

Thanks,
Mridul

On Tue, Dec 7, 2021 at 11:41 PM Gengliang Wang  wrote:

> +1 for new maintenance releases for all 3.x branches as well.
>
> On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:
>
>> SGTM!
>>
>> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>>
>>> I prefer to start rolling the release in January if there is no need to
>>> publish it sooner :)
>>>
>>> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:
>>>
 Oh BTW, I realised that it's a holiday season soon this month including
 Christmas and new year.
 Shall we maybe start rolling the release around next January? I would
 leave it to @huaxin gao  :-).

 On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
 wrote:

> +1 for new releases.
>
> Dongjoon.
>
> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan 
> wrote:
>
>> +1 to make new maintenance releases for all 3.x branches.
>>
>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>
>>> Always fine by me if someone wants to roll a release.
>>>
>>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
>>> new release of those wouldn't hurt either, if any of our release 
>>> managers
>>> have the time or inclination. 3.0.x is reaching unofficial end-of-life
>>> around now anyway.
>>>
>>>
>>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 It's been two months since Spark 3.2.0 release, and we have
 resolved many bug fixes and regressions. What do you guys think about
 rolling Spark 3.2.1 release?

 cc @huaxin gao  FYI who I happened to
 overhear that is interested in rolling the maintenance release :-).

>>>


Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Mridul Muralidharan
Nice job !
There are some nice API's which should be interesting to explore with JDK
17 :-)

Regards.
Mridul

On Fri, Nov 12, 2021 at 7:08 PM Yuming Wang  wrote:

> Cool, thank you Dongjoon.
>
> On Sat, Nov 13, 2021 at 4:09 AM shane knapp ☠  wrote:
>
>> woot!  nice work everyone!  :)
>>
>> On Fri, Nov 12, 2021 at 11:37 AM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> Apache Spark community has been working on Java 17 support under the
>>> following JIRA.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-33772
>>>
>>> As of today, Apache Spark starts to have daily Java 17 test coverage via
>>> GitHub Action jobs for Apache Spark 3.3.
>>>
>>>
>>> https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L38-L39
>>>
>>> Today's successful run is here.
>>>
>>> https://github.com/apache/spark/actions/runs/1453788012
>>>
>>> Please note that we are still working on some new Java 17 features like
>>>
>>> JEP 391: macOS/AArch64 Port
>>> https://bugs.openjdk.java.net/browse/JDK-8251280
>>>
>>> For example, Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 already
>>> support Apple Silicon natively, but some 3rd party libraries like
>>> RocksDB/LevelDB are not ready yet. Since Mac is one of the popular dev
>>> environments, we are going to keep monitoring and improving gradually for
>>> Apache Spark 3.3.
>>>
>>> Please test Java 17 and let us know your feedback.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>


Re: Update Spark 3.3 release window?

2021-10-28 Thread Mridul Muralidharan
+1 to EOL 2.x
Mid march sounds like a good placeholder for 3.3.

Regards,
Mridul

On Wed, Oct 27, 2021 at 10:38 PM Sean Owen  wrote:

> Seems fine to me - as good a placeholder as anything.
> Would that be about time to call 2.x end-of-life?
>
> On Wed, Oct 27, 2021 at 9:36 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> Spark 3.2. is out. Shall we update the release window
>> https://spark.apache.org/versioning-policy.html?
>> I am thinking of Mid March 2022 (5 months after the 3.2 release) for code
>> freeze and onward.
>>
>>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Mridul Muralidharan
Congratulations everyone !
And thanks Gengliang for sheparding  the release out :-)

Regards,
Mridul

On Tue, Oct 19, 2021 at 9:25 AM Yuming Wang  wrote:

> Congrats and thanks!
>
> On Tue, Oct 19, 2021 at 10:17 PM Gengliang Wang  wrote:
>
>> Hi all,
>>
>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
>> contribution from the open-source community, this release managed to
>> resolve in excess of 1,700 Jira tickets.
>>
>> We'd like to thank our contributors and users for their contributions and
>> early feedback to this release. This release would not have been possible
>> without you.
>>
>> To download Spark 3.2.0, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-2-0.html
>>
>


Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-07 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phadoop-2.7 -Pyarn -Pmesos
-Pkubernetes.

Regards,
Mridul

On Wed, Oct 6, 2021 at 12:55 PM Michael Heuer  wrote:

> +1 (non-binding)
>
>michael
>
>
> On Oct 6, 2021, at 11:49 AM, Gengliang Wang  wrote:
>
> Starting with my +1(non-binding)
>
> Thanks,
> Gengliang
>
> On Thu, Oct 7, 2021 at 12:48 AM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time October 11 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc7 (commit
>> 5d45a415f3a29898d92380380cfd82bfc7f579ea):
>> https://github.com/apache/spark/tree/v3.2.0-rc7
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1394
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc7.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.0?
>> ===
>> The current list of open tickets targeted at 3.2.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>
>


Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-29 Thread Mridul Muralidharan
Yi Wu helped identify an issue
 which causes
correctness (duplication) and hangs - waiting for validation to complete
before submitting a patch.

Regards,
Mridul

On Wed, Sep 29, 2021 at 11:34 AM Holden Karau  wrote:

> PySpark smoke tests pass, I'm going to do a last pass through the JIRAs
> before my vote though.
>
> On Wed, Sep 29, 2021 at 8:54 AM Sean Owen  wrote:
>
>> +1 looks good to me as before, now that a few recent issues are resolved.
>>
>>
>> On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang  wrote:
>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time September 30 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.2.0-rc6 (commit
>>> dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
>>> https://github.com/apache/spark/tree/v3.2.0-rc6
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1393
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/
>>>
>>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>
>>> This release is using the release script of the tag v3.2.0-rc6.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.2.0?
>>> ===
>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Mridul Muralidharan
The failure I observed looks the same as what Venkat mentioned, lz4 tests
in FileSuite in core were failing with hadoop-2.7 profile.

Regards,
Mridul

On Tue, Sep 21, 2021 at 7:44 PM Chao Sun  wrote:

> Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
> tried it locally on the master branch and the tests are all passing. Could
> you provide more details?
>
> The reason we want to disable the LZ4 test is because it requires the
> native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
> have.
>
> On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
>> Hi Chao,
>>
>> But there are tests in core as well failing. For
>> eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
>> you think we should disable these tests for hadoop version < 3.x?
>>
>> Regards
>> Venkata krishnan
>>
>>
>> On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:
>>
>>> I just created SPARK-36820 for the above LZ4 test issue. Will post a PR
>>> there soon.
>>>
>>> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>>>
>>>> Mridul, is the LZ4 failure about Parquet? I think Parquet currently
>>>> uses Hadoop compression codec while Hadoop 2.7 still depends on native lib
>>>> for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>>>>
>>>> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
>>>> wrote:
>>>>
>>>>>
>>>>> Signatures, digests, etc check out fine.
>>>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes,
>>>>> this worked fine.
>>>>>
>>>>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
>>>>> library not available").
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
>>>>> wrote:
>>>>>
>>>>>> To Stephen: Thanks for pointing that out. I agree with that.
>>>>>> To Sean: I made a PR
>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/34059__;!!IKRxdwAv5BmarQ!O-njQDJjvUEKCXotXCcks-Bp6M5Hvwm2lVAdEvN7Wdi_DsazPKxBtqP5St4gRBM$>
>>>>>>  to
>>>>>> remove the test dependency so that we can start RC4 ASAP.
>>>>>>
>>>>>> Gengliang
>>>>>>
>>>>>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>>>>>
>>>>>>> Hm yeah I tend to agree. See
>>>>>>> https://github.com/apache/spark/pull/33912
>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/33912__;!!IKRxdwAv5BmarQ!O-njQDJjvUEKCXotXCcks-Bp6M5Hvwm2lVAdEvN7Wdi_DsazPKxBtqP5nHr4Dvc$>
>>>>>>> This _is_ a test-only dependency which makes it less of an issue.
>>>>>>> I'm guessing it's not in Maven as it's a small one-off utility; we
>>>>>>> _could_ just inline the ~100 lines of code in test code instead?
>>>>>>>
>>>>>>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>>>>>>  wrote:
>>>>>>>
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> I was going to -1 this because of the
>>>>>>>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not 
>>>>>>>> available on
>>>>>>>> Maven Central, and therefore is not available from our repository 
>>>>>>>> manager
>>>>>>>> (Nexus).
>>>>>>>>
>>>>>>>> Historically  most places I have worked have avoided other public
>>>>>>>> maven repositories because they are not well curated. i.e artifacts 
>>>>>>>> with
>>>>>>>> the same GAV have been known to change over time, which never happens 
>>>>>>>> with
>>>>>>>> Maven Central.
>>>>>>>>
>>>>>>>> I know that I can address this by changing my settings.xml file.
>>>>>>>>
>>>>>>>> Anyway, I can see this biting other people so I thought that I
>>>>>>>> would mention it.
>>>>>>>>
>>>>>>>> Steve C
>>>>&

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Mridul Muralidharan
Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes, this
worked fine.

I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
library not available").

Regards,
Mridul

On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang  wrote:

> To Stephen: Thanks for pointing that out. I agree with that.
> To Sean: I made a PR  to
> remove the test dependency so that we can start RC4 ASAP.
>
> Gengliang
>
> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>
>> Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
>> This _is_ a test-only dependency which makes it less of an issue.
>> I'm guessing it's not in Maven as it's a small one-off utility; we
>> _could_ just inline the ~100 lines of code in test code instead?
>>
>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>  wrote:
>>
>>> Hi there,
>>>
>>> I was going to -1 this because of the
>>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
>>> Maven Central, and therefore is not available from our repository manager
>>> (Nexus).
>>>
>>> Historically  most places I have worked have avoided other public maven
>>> repositories because they are not well curated. i.e artifacts with the same
>>> GAV have been known to change over time, which never happens with Maven
>>> Central.
>>>
>>> I know that I can address this by changing my settings.xml file.
>>>
>>> Anyway, I can see this biting other people so I thought that I would
>>> mention it.
>>>
>>> Steve C
>>>
>>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time September 24 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>> 
>>>
>>> The tag to be voted on is v3.2.0-rc3 (commit
>>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>>> https://github.com/apache/spark/tree/v3.2.0-rc3
>>> 
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>>> 
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> 
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1390
>>> 
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>>> 

Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-09 Thread Mridul Muralidharan
I have filed a blocker, SPARK-36705
 which will need to be
addressed.

Regards,
Mridul


On Sun, Sep 5, 2021 at 8:47 AM Gengliang Wang  wrote:

> Hi all,
>
> the voting fails.
> Liang-Chi reported a new block SPARK-36669
> . We will have RC3
> when the existing issues are resolved.
>
>
> On Thu, Sep 2, 2021 at 5:01 AM Sean Owen  wrote:
>
>> This RC looks OK to me too, understanding we may need to have RC3 for the
>> outstanding issues though.
>>
>> The issue with the Scala 2.13 POM is still there; I wasn't able to figure
>> it out (anyone?), though it may not affect 'normal' usage (and is
>> work-around-able in other uses, it seems), so may be sufficient if Scala
>> 2.13 support is experimental as of 3.2.0 anyway.
>>
>>
>> On Wed, Sep 1, 2021 at 2:08 AM Gengliang Wang  wrote:
>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time September 3 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.2.0-rc2 (commit
>>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>>> https://github.com/apache/spark/tree/v3.2.0-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1389
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>>>
>>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>
>>> This release is using the release script of the tag v3.2.0-rc2.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.2.0?
>>> ===
>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Mridul Muralidharan
Hi,

  Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Pmesos
-Pkubernetes

I am seeing test failures which are addressed by #33790
 - this is in branch-3.2, but
after the RC tag.
After updating to the head of branch-3.2, I can get that test to pass.

Given the failure, and as the fix is already in the branch, will -1 the RC.

Regards,
Mridul


On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: -1s on committed but not released code?

2021-08-19 Thread Mridul Muralidharan
Hi Holden,

  In the past, I have seen discussions on the merged pr to thrash out the
details.
Usually it would be clear whether to revert and reformulate the change or
concerns get addressed and possibly result in follow up work.

This is usually helped by the fact that we typically are conservative and
don’t merge changes too quickly: giving folks sufficient time to review and
opine.

Regards,
Mridul

On Thu, Aug 19, 2021 at 1:36 PM Holden Karau  wrote:

> Hi Y'all,
>
> This just recently came up but I'm not super sure on how we want to handle
> this in general. If code was committed under the lazy consensus model and
> then a committer or PMC -1s it post merge, what do we want to do?
>
> I know we had some previous discussion around -1s, but that was largely
> focused on pre-commit -1s.
>
> Cheers,
>
> Holden :)
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: ASF board report draft for August

2021-08-09 Thread Mridul Muralidharan
Hi Matei,

  3.2 will also include support for pushed based shuffle (spip SPARK-30602).

Regards,
Mridul

On Mon, Aug 9, 2021 at 9:26 PM Hyukjin Kwon  wrote:

> > Are you referring to what version of Koala project? 1.8.1?
>
> Yes, the latest version 1.8.1.
>
> 2021년 8월 10일 (화) 오전 11:07, Igor Costa 님이 작성:
>
>> Hi Matei, nice update
>>
>>
>> Just one question, when you mention “ We are working on Spark 3.2.0 as
>> our next release, with a release candidate likely to come soon. Spark 3.2
>> includes a new Pandas API for Apache Spark based on the Koalas project”
>>
>>
>> Are you referring to what version of Koala project? 1.8.1?
>>
>>
>>
>> Cheers
>> Igor
>>
>> On Tue, 10 Aug 2021 at 13:31, Matei Zaharia 
>> wrote:
>>
>>> It’s time for our quarterly report to the ASF board, which we need to
>>> send out this Wednesday. I wrote the draft below based on community
>>> activity — let me know if you’d like to add or change anything:
>>>
>>> ==
>>>
>>> Description:
>>>
>>> Apache Spark is a fast and general engine for large-scale data
>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>> well as a rich set of libraries including stream processing, machine
>>> learning, and graph analytics.
>>>
>>> Issues for the board:
>>>
>>> - None
>>>
>>> Project status:
>>>
>>> - We made a number of maintenance releases in the past three months. We
>>> released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for
>>> the 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug
>>> fix release for the Spark 2.x line. This may be the last release on 2.x
>>> unless major new bugs are found.
>>>
>>> - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and
>>> Takeshi Yamamuro.
>>>
>>> - We are working on Spark 3.2.0 as our next release, with a release
>>> candidate likely to come soon. Spark 3.2 includes a new Pandas API for
>>> Apache Spark based on the Koalas project, a RocksDB state store for
>>> Structured Streaming, native support for session windows, error message
>>> standardization, and significant improvements to Spark SQL, such as the use
>>> of adaptive query execution by default.
>>>
>>> Trademarks:
>>>
>>> - No changes since the last report.
>>>
>>> Latest releases:
>>>
>>> - Spark 3.1.2 was released on June 23rd, 2021.
>>> - Spark 3.0.3 was released on June 1st, 2021.
>>> - Spark 2.4.8 was released on May 17th, 2021.
>>>
>>> Committers and PMC:
>>>
>>> - The latest committers were added on March 11th, 2021 (Atilla Zsolt
>>> Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
>>> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
>>>
>>>
>>>
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
>> Sent from Gmail Mobile
>>
>


Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-19 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Pmesos
-Pkubernetes

Regards,
Mridul

PS: Might be related to some quirk of my local env - the first test run
(after clean + package) usually fails for me (typically for hive tests) -
with a second run succeeding : this is not specific to this RC though.

On Fri, Jun 18, 2021 at 6:14 PM Liang-Chi Hsieh  wrote:

> +1. Docs looks good. Binary looks good.
>
> Ran simple test and some tpcds queries.
>
> Thanks for working on this!
>
>
> wuyi wrote
> > Please vote on releasing the following candidate as Apache Spark version
> > 3.0.3.
> >
> > The vote is open until Jun 21th 3AM (PST) and passes if a majority +1 PMC
> > votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.0.3
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v3.0.3-rc1 (commit
> > 65ac1e75dc468f53fc778cd2ce1ba3f21067aab8):
> > https://github.com/apache/spark/tree/v3.0.3-rc1
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.3-rc1-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1386/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.3-rc1-docs/
> >
> > The list of bug fixes going into 3.0.3 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12349723
> >
> > This release is using the release script of the tag v3.0.3-rc1.
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.0.3?
> > ===
> >
> > The current list of open tickets targeted at 3.0.3 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > Version/s" = 3.0.3
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.0.3 Release?

2021-06-08 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Tue, Jun 8, 2021 at 10:11 PM Hyukjin Kwon  wrote:

> Yeah, +1
>
> 2021년 6월 9일 (수) 오후 12:06, Yi Wu 님이 작성:
>
>> Hi, All.
>>
>> Since Apache Spark 3.0.2 tag creation (Feb 16),
>> new 119 patches (92 issues
>> 
>> resolved) arrived at branch-3.0.
>>
>> Shall we make a new release, Apache Spark 3.0.3, as the 3rd release at
>> the 3.0 line?
>> I'd like to volunteer as the release manager for Apache Spark 3.0.3.
>> I'm thinking about starting the first RC at the end of this week.
>>
>> $ git log --oneline v3.0.2..HEAD | wc -l
>>  119
>>
>> # Known correctness issues
>> SPARK-34534  New
>> protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss or
>> correctness
>> SPARK-34545 
>> PySpark Python UDF return inconsistent results when applying 2 UDFs with
>> different return type to 2 columns together
>> SPARK-34719  fail
>> if the view query has duplicated column names
>> SPARK-34794 
>> Nested higher-order functions broken in DSL
>>
>> # Notable user-facing changes
>> SPARK-32924  Web
>> UI sort on duration is wrong
>> SPARK-35405 
>>  Submitting Applications documentation has outdated information about K8s
>> client mode support
>>
>> Thanks,
>> Yi
>>
>


Re: Resolves too old JIRAs as incomplete

2021-05-20 Thread Mridul Muralidharan
+1, thanks Takeshi !

Regards,
Mridul

On Wed, May 19, 2021 at 8:48 PM Takeshi Yamamuro 
wrote:

> Hi, dev,
>
> As you know, we have too many open JIRAs now:
> # of open JIRAs=2698: JQL='project = SPARK AND status in (Open, "In
> Progress", Reopened)'
>
> We've recently released v2.4.8(EOL), so I'd like to bulk-close too old
> JIRAs
> for making the JIRAs manageable.
>
> As Hyukjin did the same action two years ago (for details, see:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Resolving-all-JIRAs-affecting-EOL-releases-td27838.html),
> I'm planning to use a similar JQL below to close them:
>
> project = SPARK AND status in (Open, "In Progress", Reopened) AND
> (affectedVersion = EMPTY OR NOT (affectedVersion in versionMatch("^3.*")))
> AND updated <= -52w
>
> The total number of matched JIRAs is 741.
> Or, we might be able to close them more aggressively by removing the
> version condition:
>
> project = SPARK AND status in (Open, "In Progress", Reopened) AND updated
> <= -52w
>
> The matched number is 1484 (almost half of the current open JIRAs).
>
> If there is no objection, I'd like to do it next week or later.
> Any thoughts?
>
> Bests,
> Takeshi
> --
> ---
> Takeshi Yamamuro
>


Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested.

Regards,
Mridul

On Sun, May 9, 2021 at 4:22 PM Liang-Chi Hsieh  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.8.
>
> The vote is open until May 14th at 9AM PST and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.8
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 2.4.8 (try project = SPARK AND
> "Target Version/s" = "2.4.8" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v2.4.8-rc4 (commit
> 163fbd2528a18bf062bddf7b7753631a12a369b5):
> https://github.com/apache/spark/tree/v2.4.8-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.8-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1383/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.8-rc4-docs/
>
> The list of bug fixes going into 2.4.8 can be found at the following URL:
> https://s.apache.org/spark-v2.4.8-rc4
>
> This release is using the release script of the tag v2.4.8-rc4.
>
> FAQ
>
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.8?
> ===
>
> The current list of open tickets targeted at 2.4.8 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.8
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Mridul Muralidharan
Do we have a fix for this in 3.x/master which can be backported without too
much surrounding change ?
Given we are expecting 2.4.7 to probably be the last release for 2.4, if we
can fix it, that would be great.

Regards,
Mridul

On Wed, Apr 7, 2021 at 9:31 PM Liang-Chi Hsieh  wrote:

> Thanks for voting.
>
> After I started running the release script to cut RC1 for a while, I found
> a
> nested column pruning bug SPARK-34963, and unfortunately it exists in 2.4.7
> too. As RC1 is cut, so I continue this voting.
>
> The bug looks corner case to me and it is not reported yet since we support
> nested column pruning from 2.4. So maybe it is okay to not fix it in 2.4?
>
>
>
>
> cloud0fan wrote
> > +1
> >
> > On Thu, Apr 8, 2021 at 9:24 AM Sean Owen 
>
> > srowen@
>
> >  wrote:
> >
> >> Looks good to me testing on Java 8, Hadoop 2.7, Ubuntu, with about all
> >> profiles enabled.
> >> I still get an odd failure in the Hive versions suite, but I keep seeing
> >> that in my env and think it's something odd about my setup.
> >> +1
> >>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-27 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Sat, Mar 27, 2021 at 6:09 PM Xiao Li  wrote:

> +1
>
> Xiao
>
> Takeshi Yamamuro  于2021年3月26日周五 下午4:14写道:
>
>> +1 (non-binding)
>>
>> On Sat, Mar 27, 2021 at 4:53 AM Liang-Chi Hsieh  wrote:
>>
>>> +1 (non-binding)
>>>
>>>
>>> rxin wrote
>>> > +1. Would open up a huge persona for Spark.
>>> >
>>> > On Fri, Mar 26 2021 at 11:30 AM, Bryan Cutler <
>>>
>>> > cutlerb@
>>>
>>> >  > wrote:
>>> >
>>> >>
>>> >> +1 (non-binding)
>>> >>
>>> >>
>>> >> On Fri, Mar 26, 2021 at 9:49 AM Maciej <
>>>
>>> > mszymkiewicz@
>>>
>>> >  > wrote:
>>> >>
>>> >>
>>> >>> +1 (nonbinding)
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>


Re: Welcoming six new Apache Spark committers

2021-03-26 Thread Mridul Muralidharan
Congratulations, looking forward to more exciting contributions !

Regards,
Mridul

On Fri, Mar 26, 2021 at 8:21 PM Dongjoon Hyun 
wrote:

>
> Congratulations! :)
>
> Bests,
> Dongjoon.
>
> On Fri, Mar 26, 2021 at 5:55 PM angers zhu  wrote:
>
>> Congratulations
>>
>> Prashant Sharma  于2021年3月27日周六 上午8:35写道:
>>
>>> Congratulations  all!!
>>>
>>> On Sat, Mar 27, 2021, 5:10 AM huaxin gao  wrote:
>>>
 Congratulations to you all!!

 On Fri, Mar 26, 2021 at 4:22 PM Yuming Wang  wrote:

> Congrats!
>
> On Sat, Mar 27, 2021 at 7:13 AM Takeshi Yamamuro <
> linguin@gmail.com> wrote:
>
>> Congrats, all~
>>
>> On Sat, Mar 27, 2021 at 7:46 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Congrats all!
>>>
>>> 2021년 3월 27일 (토) 오전 6:56, Liang-Chi Hsieh 님이 작성:
>>>
 Congrats! Welcome!


 Matei Zaharia wrote
 > Hi all,
 >
 > The Spark PMC recently voted to add several new committers.
 Please join me
 > in welcoming them to their new role! Our new committers are:
 >
 > - Maciej Szymkiewicz (contributor to PySpark)
 > - Max Gekk (contributor to Spark SQL)
 > - Kent Yao (contributor to Spark SQL)
 > - Attila Zsolt Piros (contributor to decommissioning and Spark on
 > Kubernetes)
 > - Yi Wu (contributor to Spark Core and SQL)
 > - Gabor Somogyi (contributor to Streaming and security)
 >
 > All six of them contributed to Spark 3.1 and we’re very excited
 to have
 > them join as committers.
 >
 > Matei and the Spark PMC
 >
 -
 > To unsubscribe e-mail:

 > dev-unsubscribe@.apache





 --
 Sent from:
 http://apache-spark-developers-list.1001551.n3.nabble.com/


 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>


Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-02 Thread Mridul Muralidharan
Thanks Hyukjin and congratulations everyone on the release !

Regards,
Mridul

On Tue, Mar 2, 2021 at 8:54 PM Yuming Wang  wrote:

> Great work, Hyukjin!
>
> On Wed, Mar 3, 2021 at 9:50 AM Hyukjin Kwon  wrote:
>
>> We are excited to announce Spark 3.1.1 today.
>>
>> Apache Spark 3.1.1 is the second release of the 3.x line. This release
>> adds
>> Python type annotations and Python dependency management support as part
>> of Project Zen.
>> Other major updates include improved ANSI SQL compliance support, history
>> server support
>> in structured streaming, the general availability (GA) of Kubernetes and
>> node decommissioning
>> in Kubernetes and Standalone. In addition, this release continues to
>> focus on usability, stability,
>> and polish while resolving around 1500 tickets.
>>
>> We'd like to thank our contributors and users for their contributions and
>> early feedback to
>> this release. This release would not have been possible without you.
>>
>> To download Spark 3.1.1, head over to the download page:
>> http://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-1-1.html
>>
>>


Re: Apache Spark 3.2 Expectation

2021-02-25 Thread Mridul Muralidharan
Nit: Java 17 -> should be available by Sept 2021 :-)
Adoption would also depend on some of our nontrivial dependencies
supporting it - it might be a stretch to get it in for Apache Spark 3.2 ?

Features:
Push based shuffle and disaggregated shuffle should also be in 3.2


Regards,
Mridul






On Thu, Feb 25, 2021 at 10:57 AM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Since we have been preparing Apache Spark 3.2.0 in master branch since
> December 2020, March seems to be a good time to share our thoughts and
> aspirations on Apache Spark 3.2.
>
> According to the progress on Apache Spark 3.1 release, Apache Spark 3.2
> seems to be the last minor release of this year. Given the timeframe, we
> might consider the following. (This is a small set. Please add your
> thoughts to this limited list.)
>
> # Languages
>
> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but slipped
> out. Currently, we are trying to use Scala 2.13.5 via SPARK-34505 and
> investigating the publishing issue. Thank you for your contributions and
> feedback on this.
>
> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017. Like
> Java 11, we need lots of support from our dependencies. Let's see.
>
> - Python 3.6 Deprecation(?): Python 3.6 community support ends at
> 2021-12-23. So, the deprecation is not required yet, but we had better
> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022.
>
> - SparkR CRAN publishing: As we know, it's discontinued so far. Resuming
> it depends on the success of Apache SparkR 3.1.1 CRAN publishing. If it
> succeeds to revive it, we can keep publishing. Otherwise, I believe we had
> better drop it from the releasing work item list officially.
>
> # Dependencies
>
> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop profile in
> Apache Spark 3.1. Currently, Spark master branch lives on Hadoop 3.2.2's
> shaded clients via SPARK-33212. So far, there is one on-going report at
> YARN environment. We hope it will be fixed soon at Spark 3.2 timeframe and
> we can move toward Hadoop 3.3.2.
>
> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default instead
> of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile completely via
> SPARK-32981 and replaced the generated hive-service-rpc code with the
> official dependency via SPARK-32981. We are steadily improving this area
> and will consume Hive 2.3.9 if available.
>
> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades K8s client
> dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order to support
> K8s model 1.19.
>
> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using Kafka
> Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 with Scala
> 2.12.13, but it was reverted later due to Scala 2.12.13 issue. Since
> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will go
> with Kafka Client 2.8 hopefully.
>
> # Some Features
>
> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with Apache
> Iceberg integration. Especially, we hope the on-going function catalog SPIP
> and up-coming storage partitioned join SPIP can be delivered as a part of
> Spark 3.2 and become an additional foundation.
>
> - Columnar Encryption: As of today, Apache Spark master branch supports
> columnar encryption via Apache ORC 1.6 and it's documented via SPARK-34036.
> Also, upcoming Apache Parquet 1.12 has a similar capability. Hopefully,
> Apache Spark 3.2 is going to be the first release to have this feature
> officially. Any feedback is welcome.
>
> - Improved ZStandard Support: Spark 3.2 will bring more benefits for
> ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool support
> for all IO operations, 2) SPARK-33978 makes ORC datasource support ZSTD
> compression, 3) SPARK-34503 sets ZSTD as the default codec for event log
> compression, 4) SPARK-34479 aims to support ZSTD at Avro data source. Also,
> the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer pool),
> too. I'm expecting more benefits.
>
> - Structure Streaming with RocksDB backend: According to the latest
> update, it looks active enough for merging to master branch in Spark 3.2.
>
> Please share your thoughts and let's build better Apache Spark 3.2
> together.
>
> Bests,
> Dongjoon.
>


Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-24 Thread Mridul Muralidharan
That is indeed cause for concern.
+1 on extending the voting deadline until we finish investigation of this.

Regards,
Mridul


On Wed, Feb 24, 2021 at 12:55 PM Xiao Li  wrote:

> -1 Could we extend the voting deadline?
>
> A few TPC-DS queries (q17, q18, q39a, q39b) are returning different
> results between Spark 3.0 and Spark 3.1. We need a few more days to
> understand whether these changes are expected.
>
> Xiao
>
>
> Mridul Muralidharan  于2021年2月24日周三 上午10:41写道:
>
>>
>> Sounds good, thanks for clarifying Hyukjin !
>> +1 on release.
>>
>> Regards,
>> Mridul
>>
>>
>> On Wed, Feb 24, 2021 at 2:46 AM Hyukjin Kwon  wrote:
>>
>>> I remember HiveExternalCatalogVersionsSuite was flaky for a while which
>>> is fixed in
>>> https://github.com/apache/spark/commit/0d5d248bdc4cdc71627162a3d20c42ad19f24ef4
>>> and .. KafkaDelegationTokenSuite is flaky (
>>> https://issues.apache.org/jira/browse/SPARK-31250).
>>>
>>> 2021년 2월 24일 (수) 오후 5:19, Mridul Muralidharan 님이 작성:
>>>
>>>>
>>>> Signatures, digests, etc check out fine.
>>>> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
>>>> -Phive-thriftserver -Pmesos -Pkubernetes
>>>>
>>>> I keep getting test failures with
>>>> * org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
>>>> * org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.
>>>> (Note: I remove $HOME/.m2 and $HOME/.iv2 paths before build)
>>>>
>>>> Removing these suites gets the build through though - does anyone have
>>>> suggestions on how to fix it ? I did not face this with RC1.
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>>
>>>> On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 3.1.1.
>>>>>
>>>>> The vote is open until February 24th 11PM PST and passes if a majority
>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 3.1.1
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v3.1.1-rc3 (commit
>>>>> 1d550c4e90275ab418b9161925049239227f3dc9):
>>>>> https://github.com/apache/spark/tree/v3.1.1-rc3
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> <https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/>
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1367
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-docs/
>>>>>
>>>>> The list of bug fixes going into 3.1.1 can be found at the following
>>>>> URL:
>>>>> https://s.apache.org/41kf2
>>>>>
>>>>> This release is using the release script of the tag v3.1.1-rc3.
>>>>>
>>>>> FAQ
>>>>>
>>>>> ===
>>>>> What happened to 3.1.0?
>>>>> ===
>>>>>
>>>>> There was a technical issue during Apache Spark 3.1.0 preparation, and
>>>>> it was discussed and decided to skip 3.1.0.
>>>>> Please see
>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>>>>> more details.
>>>>>
>>>>> =
>>>>> How can I help test this release?
>>>>> =
>>>>>
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> If you're working in 

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-24 Thread Mridul Muralidharan
Sounds good, thanks for clarifying Hyukjin !
+1 on release.

Regards,
Mridul


On Wed, Feb 24, 2021 at 2:46 AM Hyukjin Kwon  wrote:

> I remember HiveExternalCatalogVersionsSuite was flaky for a while which
> is fixed in
> https://github.com/apache/spark/commit/0d5d248bdc4cdc71627162a3d20c42ad19f24ef4
> and .. KafkaDelegationTokenSuite is flaky (
> https://issues.apache.org/jira/browse/SPARK-31250).
>
> 2021년 2월 24일 (수) 오후 5:19, Mridul Muralidharan 님이 작성:
>
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
>> -Phive-thriftserver -Pmesos -Pkubernetes
>>
>> I keep getting test failures with
>> * org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
>> * org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.
>> (Note: I remove $HOME/.m2 and $HOME/.iv2 paths before build)
>>
>> Removing these suites gets the build through though - does anyone have
>> suggestions on how to fix it ? I did not face this with RC1.
>>
>> Regards,
>> Mridul
>>
>>
>> On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.1.1.
>>>
>>> The vote is open until February 24th 11PM PST and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.1-rc3 (commit
>>> 1d550c4e90275ab418b9161925049239227f3dc9):
>>> https://github.com/apache/spark/tree/v3.1.1-rc3
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> <https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1367
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-docs/
>>>
>>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>>> https://s.apache.org/41kf2
>>>
>>> This release is using the release script of the tag v3.1.1-rc3.
>>>
>>> FAQ
>>>
>>> ===
>>> What happened to 3.1.0?
>>> ===
>>>
>>> There was a technical issue during Apache Spark 3.1.0 preparation, and
>>> it was discussed and decided to skip 3.1.0.
>>> Please see
>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>>> more details.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC via "pip install
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/pyspark-3.1.1.tar.gz
>>> "
>>> and see if anything important breaks.
>>> In the Java/Scala, you can add the staging repository to your projects
>>> resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.1.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.1.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>


Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-24 Thread Mridul Muralidharan
Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
-Phive-thriftserver -Pmesos -Pkubernetes

I keep getting test failures with
* org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
* org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.
(Note: I remove $HOME/.m2 and $HOME/.iv2 paths before build)

Removing these suites gets the build through though - does anyone have
suggestions on how to fix it ? I did not face this with RC1.

Regards,
Mridul


On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until February 24th 11PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc3 (commit
> 1d550c4e90275ab418b9161925049239227f3dc9):
> https://github.com/apache/spark/tree/v3.1.1-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> 
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1367
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc3.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-18 Thread Mridul Muralidharan
  I agree, Assignee has been used primarily to give recognition to the
contributor who ended up submitting the patch which got merged.
Typically jira's remain unassigned - even if it were to be assigned, it
conveys no meaning or ownership or ongoing work : IMO it is equivalent to
an unassigned jira.
There could ofcourse be discussions on dev@ or comments in the jira or
design docs or WIP PR"s - but if a better approach comes along, or previous
work stalls - contributors can (and please, must !) contribute to the jira.
Ofcourse, if there is active work going on as a PR under review or
SPIP/proposal being discussed - taking part in that process would help imo.

Regards,
Mridul


On Thu, Feb 18, 2021 at 11:00 PM Sean Owen  wrote:

> I don't believe Assignee has ever been used for anything except to give a
> bit of informal credit to the person who drove most of the work on the
> issue, when it's resolved.
> If that's the question - does Assignee mean only that person can work on
> the issue? then no, it has never meant that.
>
> You say you have an example, one that was resolved. Is this a single case
> or systemic? I don't think I recall seeing problems of this form.
>
> We _have_ had multiple incompatible PRs for a JIRA before, occasionally.
> We have also definitely had people file huge umbrella JIRAs, parts of
> which _nobody_ ever completes, but, for lack of any interest from the filer
> or anyone else.
>
> I think it's fair to give a person a reasonable shot at producing a
> solution if they propose a problem or feature.
> We have had instances where a new contributor files a relatively simple
> issue, and finds another contributor opened the obvious PR before they had
> a chance (maybe they needed a day to get the PR together). That seemed a
> bit discourteous.
>
>  If you need a solution as well, and one isn't forthcoming, just open a PR
> and propose your own? I don't hear that anyone told you not to, but I also
> don't know what this is about. You can always propose a PR as an
> alternative to compare with, to facilitate collaboration. Nothing wrong
> with that.
>
> On Thu, Feb 18, 2021 at 10:45 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> (Actually the real world case was fixed somehow and I wouldn't like to
>> point out a fixed one. I just would like to make sure what I think is
>> correct and is considered as "consensus".)
>>
>> Just consider the case as simple - someone files two different JIRA
>> issues for new features and assigns to him/herself altogether, without
>> sharing anything about the ongoing efforts someone has made. (So you have
>> no idea even someone just files two different JIRA issues without "any"
>> progress and has them in a backlog.) The new features are not new and are
>> likely something others could work in parallel.
>>
>> That said, committers can explicitly represent "I'm working on this so
>> please refrain from making redundant efforts." via assigning the issue,
>> which is actually similar to the comment "I'm working on this".
>> Unfortunately, this works only when the feature is something one who filed
>> a JIRA issue works uniquely. Occasional opposite cases aren't always a
>> notion of ignoring the signal of "I'm working on this". There're also
>> coincidences two different individuals/teams are working on exactly the
>> same at the same time.
>>
>> My concern is that "assignment" might be considered pretty much stronger
>> than just commenting "I'm working on this" - it's like "Regardless of your
>> current progress, I started working on this so don't consider your effort
>> to be proposable. You should have filed the JIRA issue before I file one."
>> Is it possible for contributors to do the same? I guess not.
>>
>> The other problem is the multiple assignments in parallel. I wouldn't
>> like to guess someone over-uses the power of assignments, but technically
>> it's simply possible that someone can file JIRA issues on his/her backlog
>> which can be done in a couple of months or so with assigning to
>> him/herself, which effectively blocks others from working or proposing the
>> same. I consider this as preemptive which sounds bad and even unfair.
>>
>> On Fri, Feb 19, 2021 at 12:14 AM Sean Owen  wrote:
>>
>>> I think it's OK to raise particular instances. It's hard for me to
>>> evaluate further in the abstract.
>>>
>>> I don't think we use Assignee much at all, except to kinda give credit
>>> when something is done. No piece of code or work can be solely owned by one
>>> person; this is just ASF policy.
>>>
>>> I think we've seen the occasional opposite case too: someone starts
>>> working on an issue, and then someone else also starts working on it with a
>>> competing fix or change.
>>>
>>> These are ultimately issues of communication. If an issue is pretty
>>> stalled, and you have a proposal, nothing wrong with just going ahead with
>>> a proposal. There may be no disagreement. It might result in the
>>> other person joining your PR. 

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-10 Thread Mridul Muralidharan
Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
-Phive-thriftserver -Pmesos -Pkubernetes

I keep getting test failures
with org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: removing this
suite gets the build through though - does anyone have suggestions on how
to fix it ?
Perhaps a local problem at my end ?


Regards,
Mridul



On Mon, Feb 8, 2021 at 6:24 PM Hyukjin Kwon  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until February 15th 5PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> Note that it is 7 days this time because it is a holiday season in several
> countries including South Korea (where I live), China etc., and I would
> like to make sure people do not miss it because it is a holiday season.
>
> [ ] +1 Release this package as Apache Spark 3.1.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc2 (commit
> cf0115ac2d60070399af481b14566f33d22ec45e):
> https://github.com/apache/spark/tree/v3.1.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> 
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1365
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc2-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc2.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc2-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
-Phive-thriftserver -Pmesos -Pkubernetes

The sha512 signature for spark-3.1.1.tgz tripped up my scripts :-)


Regards,
Mridul


On Wed, Jan 20, 2021 at 8:17 PM 郑瑞峰  wrote:

> +1 (non-binding)
>
> Thank you, Hyukjin
>
> Bests,
> Ruifeng
>
> -- 原始邮件 --
> *发件人:* "Dongjoon Hyun" ;
> *发送时间:* 2021年1月20日(星期三) 中午1:57
> *收件人:* "Holden Karau";
> *抄送:* "Sean Owen";"Hyukjin Kwon" >;"dev";
> *主题:* Re: [VOTE] Release Spark 3.1.1 (RC1)
>
> +1
>
> I additionally
> - Ran JDBC integration test
> - Ran with AWS EKS 1.16
> - Ran unit tests with Python 3.9.1 combination (numpy 1.19.5, pandas
> 1.2.0, scipy 1.6.0)
>   (PyArrow is not tested because it's not supported in Python 3.9.x. This
> is documented via SPARK-34162)
>
> There exists some on-going work in the umbrella JIRA (SPARK-33507: Improve
> and fix cache behavior in v1 and v2).
> I believe it can be achieved at 3.2.0 and we can add some comments on the
> release note at 3.1.0.
>
> Thank you, Hyukjin and all.
>
> Bests,
> Dongjoon.
>
> On Tue, Jan 19, 2021 at 10:49 AM Holden Karau 
> wrote:
>
>> +1, pip installs on Python 3.8
>>
>> One potential thing we might want to consider if there ends up being
>> another RC is that the error message for installing with Python2 could be
>> clearer.
>>
>> Processing ./pyspark-3.1.1.tar.gz
>> ERROR: Command errored out with exit status 1:
>>  command: /tmp/py3.1/bin/python2 -c 'import sys, setuptools,
>> tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';
>> __file__='"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';f=getattr(tokenize,
>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>> egg_info --egg-base /tmp/pip-pip-egg-info-W1BsIL
>>  cwd: /tmp/pip-req-build-lmlitE/
>> Complete output (6 lines):
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "/tmp/pip-req-build-lmlitE/setup.py", line 31
>> file=sys.stderr)
>> ^
>> SyntaxError: invalid syntax
>> 
>> ERROR: Command errored out with exit status 1: python setup.py egg_info
>> Check the logs for full command output.
>>
>>
>>
>> On Tue, Jan 19, 2021 at 10:26 AM Sean Owen  wrote:
>>
>>> +1 from me. Same results as in 3.1.0 testing.
>>>
>>> On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.1.1.

 The vote is open until January 22nd 4PM PST and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.1.1-rc1 (commit
 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
 https://github.com/apache/spark/tree/v3.1.1-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1364

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

 The list of bug fixes going into 3.1.1 can be found at the following
 URL:
 https://s.apache.org/41kf2

 This release is using the release script of the tag v3.1.1-rc1.

 FAQ

 ===
 What happened to 3.1.0?
 ===

 There was a technical issue during Apache Spark 3.1.0 preparation, and
 it was discussed and decided to skip 3.1.0.
 Please see
 https://spark.apache.org/news/next-official-release-spark-3.1.1.html
 for more details.

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC via "pip install
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
 "
 and see if anything important breaks.
 In the Java/Scala, you can add the staging repository to your projects
 resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

Re: Recovering SparkR on CRAN?

2020-12-22 Thread Mridul Muralidharan
I agree, is there something we can do to ensure CRAN publish goes through
consistently and predictably ?
If possible, it would be good to continue supporting it.

Regards,
Mridul

On Tue, Dec 22, 2020 at 7:48 PM Felix Cheung  wrote:

> Ok - it took many years to get it first published, so it was hard to get
> there.
>
>
> On Tue, Dec 22, 2020 at 5:45 PM Hyukjin Kwon  wrote:
>
>> Adding @Shivaram Venkataraman  and @Felix
>> Cheung  FYI
>>
>> 2020년 12월 23일 (수) 오전 9:22, Michael Heuer 님이 작성:
>>
>>> Anecdotally, as a project downstream of Spark, we've been prevented from
>>> pushing to CRAN because of this
>>>
>>> https://github.com/bigdatagenomics/adam/issues/1851
>>>
>>> We've given up and marked as WontFix.
>>>
>>>michael
>>>
>>>
>>> On Dec 22, 2020, at 5:14 PM, Dongjoon Hyun 
>>> wrote:
>>>
>>> Given the current circumstance, I'm thinking of dropping it officially
>>> from the community release scope.
>>>
>>> It's because
>>>
>>> - It turns out that our CRAN check is insufficient to guarantee the
>>> availability of SparkR on CRAN.
>>>   Apache Spark 3.1.0 may not not available on CRAN, too.
>>>
>>> - In daily CIs, CRAN check has been broken frequently due to both our
>>> side and CRAN side issues. Currently, branch-2.4 is broken.
>>>
>>> - It also has a side-effect to cause some delays on the official release
>>> announcement after RC passes because each release manager takes a look at
>>> it if he/she can recover it at that release.
>>>
>>> If we are unable to support SparkR on CRAN in a sustainable way, what
>>> about dropping it official instead?
>>>
>>> Then, it will alleviate burdens on release managers and improves daily
>>> CIs' stability by removing the CRAN check.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Mon, Dec 21, 2020 at 7:09 AM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All.

 The last `SparkR` package of Apache Spark in CRAN is `2.4.6`.


 https://cran-archive.r-project.org/web/checks/2020/2020-07-10_check_results_SparkR.html

 The latest three Apache Spark distributions (2.4.7/3.0.0/3.0.1) are not
 published to CRAN and the lack of SparkR on CRAN has been considered a
 non-release blocker.

 I'm wondering if we are aiming to recover it in Apache Spark 3.1.0.

 Bests,
 Dongjoon.

>>>
>>>


Re: [DISCUSS] Review/merge phase, and post-review

2020-11-13 Thread Mridul Muralidharan
I try to follow the second option.
In general, when multiple reviewers are looking at the code, sometimes
addressing review comments might open up other avenues of
discussion/optimization/design discussions : atleast in core, I have seen
this happen often.

A day or so delay is worth the increased scrutiny and better design/reduced
bugs.

Regards,
Mridul

On Sat, Nov 14, 2020 at 1:47 AM Jungtaek Lim 
wrote:

> I see some voices that it's not sufficient to understand the topic. Let me
> elaborate this a bit more.
>
> 1. There're multiple reviewers reviewing the PR. (Say, A, B, C, D)
> 2. A and B leaves review comments on the PR, but no one makes the explicit
> indication that these review comments are the final one.
> 3. The author of the PR addresses the review comments.
> 4. C checks that the review comments from A and B are addressed, and
> merges the PR. In parallel (or a bit later), A is trying to check whether
> the review comments are addressed (or even more, A could provide more
> review comments afterwards), and realized the PR is already merged.
>
> Saying again, there's "technically" no incorrect point. Let's give another
> example of what I said "trade-off".
>
> 1. There're multiple reviewers reviewing the PR. (Say, A, B, C, D)
> 2. A and B leaves review comments on the PR, but no one makes the explicit
> indication that these review comments are the final one.
> 3. The author of the PR addresses the review comments.
> 4. C checks that the review comments from A and B are addressed, and asks
> A and B to confirm whether there's no further review comments, with the
> condition that it will be merged in a few days later if there's no further
> feedback.
> 5. If A and B confirms or A and B doesn't provide new feedback in the
> period, C merges the PR. If A or B provides new feedback, go back to 3 with
> resetting the days.
>
> This is what we tend to comment as "@A @B I'll leave this a few days more
> to see if anyone has further comments. Otherwise I'll merge this.".
>
> I see both are used across various PRs, so it's not really something I
> want to blame. Just want to make us think about what would be the ideal
> approach we'd be better to prefer.
>
>
> On Sat, Nov 14, 2020 at 3:46 PM Jungtaek Lim 
> wrote:
>
>> Oh sorry that was gone with flame (please just consider it as my fault)
>> and I just removed all comments.
>>
>> Btw, when I always initiate discussions, I really do love to start
>> discussion "without" specific instances which tend to go blaming each
>> other. I understand it's not easy to discuss without taking examples, but
>> I'll try to explain the situation on my best instead. Please let me know if
>> there's some ambiguous or unclear thing to think about.
>>
>> On Sat, Nov 14, 2020 at 3:41 PM Sean Owen  wrote:
>>
>>> I am sure you are referring to some specific instances but I have not
>>> followed enough to know what they are. Can you point them out? I think that
>>> is most productive for everyone to understand.
>>>
>>> On Fri, Nov 13, 2020 at 10:16 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi devs,

 I know this is a super sensitive topic and at a risk of flame, but just
 like to try this. My apologies first.
 Assuming we all know about the ASF policy about code commit and I don't
 see Spark project has any explicit BYLAWS, it's technically possible to do
 anything for committers to do during merging.

 Sometimes this goes a bit depressing for reviewers, regardless of the
 intention, when merger makes a judgement by oneself to merge while the
 reviewers are still in the review phase. I observed the practice is used
 frequently, under the fact that we have post-review to address further
 comments later.

 I know about the concern that it's sometimes blocking unintentionally
 if we require merger to gather consensus about the merge from reviewers,
 but we also have some other practice holding on merging for a couple of
 days and noticing to reviewers whether they have further comments or not,
 which is I think a good trade-off.

 Exclude the cases where we're in release blocker mode, wouldn't we be
 hurt too much if we ask merger to respect the practice on noticing to
 reviewers that merging will be happen soon and waiting a day or so? I feel
 the post-review is opening the possibility for reviewers late on the party
 to review later, but it's over-used if it is leveraged as a judgement that
 merger can merge at any time and reviewers can still continue reviewing.
 Reviewers would feel broken flow - that is not the same experience with
 having more time to finalize reviewing before merging.

 Again I know it's super hard to reconsider the ongoing practice while
 the project has gone for the long way (10 years), but just wanted to hear
 the voices about this.

 Thanks,
 Jungtaek Lim (HeartSaVioR)

>>>


Re: [VOTE] Standardize Spark Exception Messages SPIP

2020-11-04 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Wed, Nov 4, 2020 at 12:41 PM Xinyi Yu  wrote:

> Hi all,
>
> We had the discussion of SPIP: Standardize Spark Exception Messages at
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Spark-Exception-Messages-td30341.html
> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Spark-Exception-Messages-td30341.html>
>
> . The SPIP document link is at
>
> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing
> <
> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing>
>
> . We want to have the vote on this, for 72 hours.
>
> Please vote before November 7th at noon:
>
> [ ] +1: Accept this SPIP proposal
> [ ] -1: Do not agree to standardize Spark exception messages, because ...
>
>
> Thanks for your time and feedback!
>
> --
> Xinyi
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS][SPIP] Standardize Spark Exception Messages

2020-11-01 Thread Mridul Muralidharan
I like the idea of consistent messages; it makes understanding errors
easier.
Having said that, Exception messages themselves are not part of the exposed
contract to users; and are subject to change.
We should leave that flexibility open to spark developers ... I am
currently viewing this proposal as a internal standardization exercise
within Spark codebase, and not as a public contract with users.
Is that aligned with the objectives ? Or are we looking at this as a public
contract to users ?
I am not in favour of the latter.

Regards,
Mridul


On Sun, Oct 25, 2020 at 7:05 PM Xinyi Yu  wrote:

> Hi all,
>
> We like to post a SPIP of Standardize Exception Messages in Spark. Here is
> the document link:
>
> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing
> <
> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing>
>
>
> This SPIP aims to standardize the exception messages in Spark. It has three
> major focuses:
> 1. Group exception messages in dedicated files for easy maintenance and
> auditing.
> 2. Establish an error message guideline for developers.
> 3. Improve error message quality.
>
> Thanks for your time and patience. Looking forward to your feedback!
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Mridul Muralidharan
+1 on pushing the branch cut for increased dev time to match previous
releases.

Regards,
Mridul

On Sat, Oct 3, 2020 at 10:22 PM Xiao Li  wrote:

> Thank you for your updates.
>
> Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of
> the 3.1 branch cut, the feature development time window is less than 5
> months. This is shorter than what we did in Spark 2.3 and 2.4 releases.
>
> Below are three highly desirable feature work I am watching. Hopefully, we
> can finish them before the branch cut.
>
>- Support push-based shuffle to improve shuffle efficiency:
>https://issues.apache.org/jira/browse/SPARK-30602
>- Unify create table syntax:
>https://issues.apache.org/jira/browse/SPARK-31257
>- Bloom filter join: https://issues.apache.org/jira/browse/SPARK-32268
>
> Thanks,
>
> Xiao
>
>
> Hyukjin Kwon  于2020年10月3日周六 下午5:41写道:
>
>> Nice summary. Thanks Dongjoon. One minor correction -> I believe we
>> dropped R 3.5 and below at branch 2.4 as well.
>>
>> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun,  wrote:
>>
>>> Hi, All.
>>>
>>> As of today, master branch (Apache Spark 3.1.0) resolved
>>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches.
>>> According to the 3.1.0 release window, branch-3.1 will be
>>> created on November 1st and enters QA period.
>>>
>>> Here are some notable updates I've been monitoring.
>>>
>>> *Language*
>>> 01. SPARK-25075 Support Scala 2.13
>>>   - Since SPARK-32926, Scala 2.13 build test has
>>> become a part of GitHub Action jobs.
>>>   - After SPARK-33044, Scala 2.13 test will be
>>> a part of Jenkins jobs.
>>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5
>>> 03. SPARK-32082 Project Zen: Improving Python usability
>>>   - 7 of 16 issues are resolved.
>>> 04. SPARK-32073 Drop R < 3.5 support
>>>   - This is done for Spark 3.0.1 and 3.1.0.
>>>
>>> *Dependency*
>>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency
>>>   - This changes the default dist. for better cloud support
>>> 06. SPARK-32981 Remove hive-1.2 distribution
>>> 07. SPARK-20202 Remove references to org.spark-project.hive
>>>   - This will remove Hive 1.2.1 from source code
>>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP)
>>>
>>> *Core*
>>> 09. SPARK-27495 Support Stage level resource conf and scheduling
>>>   - 11 of 15 issues are resolved
>>> 10. SPARK-25299 Use remote storage for persisting shuffle data
>>>   - 8 of 14 issues are resolved
>>>
>>> *Resource Manager*
>>> 11. SPARK-33005 Kubernetes GA preparation
>>>   - It is on the way and we are waiting for more feedback.
>>>
>>> *SQL*
>>> 12. SPARK-30648/SPARK-32346 Support filters pushdown
>>>   to JSON/Avro
>>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer
>>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab
>>>   - 11 of 17 issues are resolved
>>> 15. SPARK-27589 DSv2 was mostly completed in 3.0
>>>   and added more features in 3.1 but still we missed
>>>   - All built-in DataSource v2 write paths are disabled
>>> and v1 write is used instead.
>>>   - Support partition pruning with subqueries
>>>   - Support bucketing
>>>
>>> We still have one month before the feature freeze
>>> and starting QA. If you are working for 3.1,
>>> please consider the timeline and share your schedule
>>> with the Apache Spark community. For the other stuff,
>>> we can put it into 3.2 release scheduled in June 2021.
>>>
>>> Last not but least, I want to emphasize (7) once again.
>>> We need to remove the forked unofficial Hive eventually.
>>> Please let us know your reasons if you need to build
>>> from Apache Spark 3.1 source code for Hive 1.2.
>>>
>>> https://github.com/apache/spark/pull/29936
>>>
>>> As I wrote in the above PR description, for old releases,
>>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide
>>> Hive 1.2-based distribution.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>


[RESULT] [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-18 Thread Mridul Muralidharan
Hi,

  The vote passed with 16 +1's (6 binding) and no -1's

+1s (* = binding):

Xingbo Jiang
Venkatakrishnan Sowrirajan
Tom Graves (*)
Chandni Singh
DB Tsai (*)
Xiao Li (*)
Angers Zhu
Joseph Torres
Kalyan
Dongjoon Hyun (*)
Wenchen Fan (*)
Yi Wu
叶先进 
郑瑞峰 
Takeshi Yamamuro
Mridul Muralidharan (*)

Thanks,
Mridul


Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-18 Thread Mridul Muralidharan
Adding my +1 as well, before closing the vote.

Regards,
Mridul

On Sun, Sep 13, 2020 at 9:59 PM Mridul Muralidharan 
wrote:

> Hi,
>
> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
> shuffle to improve shuffle efficiency.
> Please take a look at:
>
>- SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602
>- SPIP doc:
>
> https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y71Q/edit
>- POC against master and results summary :
>
> https://docs.google.com/document/d/1Q5m7YAp0HyG_TNFL4p_bjQgzzw33ik5i49Vr86UNZgg/edit
>
> Active discussions on the jira and SPIP document have settled.
>
> I will leave the vote open until Friday (the 18th September 2020), 5pm
> CST.
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don't think this is a good idea because ...
>
>
> Thanks,
> Mridul
>


[VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-13 Thread Mridul Muralidharan
Hi,

I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
shuffle to improve shuffle efficiency.
Please take a look at:

   - SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602
   - SPIP doc:
   
https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y71Q/edit
   - POC against master and results summary :
   
https://docs.google.com/document/d/1Q5m7YAp0HyG_TNFL4p_bjQgzzw33ik5i49Vr86UNZgg/edit

Active discussions on the jira and SPIP document have settled.

I will leave the vote open until Friday (the 18th September 2020), 5pm CST.

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...


Thanks,
Mridul


Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-09 Thread Mridul Muralidharan
I imported our KEYS file locally [1] to validate ... did not use external
keyserver.

Regards,
Mridul

[1] wget https://dist.apache.org/repos/dist/dev/spark/KEYS -O - | gpg
--import

On Wed, Sep 9, 2020 at 8:03 PM Wenchen Fan  wrote:

> I checked
> https://repository.apache.org/content/repositories/orgapachespark-1361/ ,
> it says the Signature Validation failed.
>
> Prashant, can you double-check your gpg key and make sure it's uploaded to
> public key servers like the following?
> http://pool.sks-keyservers.net:11371
> http://keyserver.ubuntu.com:11371
>
>
> On Wed, Sep 9, 2020 at 6:12 AM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and built/tested with -Pyarn -Phadoop-2.7 -Phive
>> -Phive-thriftserver -Pmesos -Pkubernetes
>>
>> Thanks,
>> Mridul
>>
>>
>> On Tue, Sep 8, 2020 at 8:55 AM Prashant Sharma 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.4.7.
>>>
>>> The vote is open until Sep 11th at 9AM PST and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.7
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> There are currently no issues targeting 2.4.7 (try project = SPARK AND
>>> "Target Version/s" = "2.4.7" AND status in (Open, Reopened, "In Progress"))
>>>
>>> The tag to be voted on is v2.4.7-rc3 (commit
>>> 14211a19f53bd0f413396582c8970e3e0a74281d):
>>> https://github.com/apache/spark/tree/v2.4.7-rc3
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.7-rc3-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1361/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.7-rc3-docs/
>>>
>>> The list of bug fixes going into 2.4.7 can be found at the following URL:
>>> https://s.apache.org/spark-v2.4.7-rc3
>>>
>>> This release is using the release script of the tag v2.4.7-rc3.
>>>
>>> FAQ
>>>
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.4.7?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.4.7 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.7
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>


Re: [VOTE] Release Spark 2.4.7 (RC3)

2020-09-08 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and built/tested with -Pyarn -Phadoop-2.7 -Phive
-Phive-thriftserver -Pmesos -Pkubernetes

Thanks,
Mridul


On Tue, Sep 8, 2020 at 8:55 AM Prashant Sharma  wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.4.7.
>
> The vote is open until Sep 11th at 9AM PST and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.7
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 2.4.7 (try project = SPARK AND
> "Target Version/s" = "2.4.7" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v2.4.7-rc3 (commit
> 14211a19f53bd0f413396582c8970e3e0a74281d):
> https://github.com/apache/spark/tree/v2.4.7-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.7-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1361/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.7-rc3-docs/
>
> The list of bug fixes going into 2.4.7 can be found at the following URL:
> https://s.apache.org/spark-v2.4.7-rc3
>
> This release is using the release script of the tag v2.4.7-rc3.
>
> FAQ
>
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.7?
> ===
>
> The current list of open tickets targeted at 2.4.7 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.7
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: Push-based shuffle SPIP

2020-08-24 Thread Mridul Muralidharan
Hi,

  Thanks for sending out the proposal Min !
For the SPIP requirements, I am willing to act as the shepherd for this
proposal.

The jira + paper + proposal provides the high level design and
implementation details.
The vldb paper discusses the performance gains in detail for the inhouse
deployment of push based shuffle.

Would be great to get feedback from our community on this feature; before
we go to voting.


Regards,
Mridul



On Mon, Aug 24, 2020 at 4:32 PM mshen  wrote:

> We raised this SPIP ticket in
> https://issues.apache.org/jira/browse/SPARK-30602 earlier this year.
> Since then, we have progressed in multiple fronts, including:
>
> * Our work is published in VLDB 2020. The final version of the paper is
> attached in the SPIP ticket.
> * We have further enhanced and productionized this work at LinkedIn, and
> have enabled production flows adopting the new push-based shuffle
> mechanism,
> with good results.
> * We have recently also ported our push-based shuffle changes to OSS Spark
> master branch, so other people can potentially try it out. Details of this
> branch is in this  doc
> <
> https://docs.google.com/document/d/16yOfI8P_O3V6hx_FnWT22jeDIItgXuXfaDAV0fJDTqQ/edit#>
>
> * The  SPIP doc
> <
> https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y71Q/edit>
>
> is also further updated reflecting more recent designs.
> * We have also discussed with multiple companies who share similar interest
> in this work.
>
> We would like to resume the discussion of this SPIP in the community, and
> push for a voting on this.
>
>
>
>
> -
> Min Shen
> Staff Software Engineer
> LinkedIn
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


  1   2   3   >