Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Yuming Wang
+1

On Tue, Feb 14, 2023 at 11:27 AM Prem Sahoo  wrote:

> +1
>
> On Mon, Feb 13, 2023 at 8:13 PM L. C. Hsieh  wrote:
>
>> +1
>>
>> On Mon, Feb 13, 2023 at 3:49 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> +1 for me
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 13 Feb 2023 at 23:18, huaxin gao  wrote:
>>>
 +1

 On Mon, Feb 13, 2023 at 3:09 PM Dongjoon Hyun 
 wrote:

> +1
>
> Dongjoon
>
> On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
> > Hi all,
> >
> > I'd like to start the vote for SPIP: Lazy Materialization for Parquet
> > Read Performance Improvement.
> >
> > The high summary of the SPIP is that it proposes an improvement to
> the
> > Parquet reader with lazy materialization which only materializes
> (i.e.
> > decompress, de-code, etc...) necessary values. For Spark-SQL filter
> > operations, evaluating the filters first and lazily materializing
> only
> > the used values can save computation wastes and improve the read
> > performance.
> >
> > References:
> >
> > JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
> > SPIP doc
> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
> > Discussion thread
> > https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you!
> >
> > Liang-Chi Hsieh
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Executor tab missing information

2023-02-13 Thread Prem Sahoo
Hello All,
I am executing spark jobs but in executor tab I am missing information, I
cant see any data/info coming up. Please let me know what I am missing .


Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Prem Sahoo
+1

On Mon, Feb 13, 2023 at 8:13 PM L. C. Hsieh  wrote:

> +1
>
> On Mon, Feb 13, 2023 at 3:49 PM Mich Talebzadeh 
> wrote:
>
>> +1 for me
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 13 Feb 2023 at 23:18, huaxin gao  wrote:
>>
>>> +1
>>>
>>> On Mon, Feb 13, 2023 at 3:09 PM Dongjoon Hyun 
>>> wrote:
>>>
 +1

 Dongjoon

 On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
 > Hi all,
 >
 > I'd like to start the vote for SPIP: Lazy Materialization for Parquet
 > Read Performance Improvement.
 >
 > The high summary of the SPIP is that it proposes an improvement to the
 > Parquet reader with lazy materialization which only materializes (i.e.
 > decompress, de-code, etc...) necessary values. For Spark-SQL filter
 > operations, evaluating the filters first and lazily materializing only
 > the used values can save computation wastes and improve the read
 > performance.
 >
 > References:
 >
 > JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
 > SPIP doc
 https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
 > Discussion thread
 > https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
 >
 > Please vote on the SPIP for the next 72 hours:
 >
 > [ ] +1: Accept the proposal as an official SPIP
 > [ ] +0
 > [ ] -1: I don’t think this is a good idea because …
 >
 > Thank you!
 >
 > Liang-Chi Hsieh
 >
 > -
 > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: Executor metrics are missing on Prometheus sink

2023-02-13 Thread Qian Sun
Hi Luca,

Thanks for your reply, which is very helpful for me :)

I am trying other metrics sinks with cAdvisor to see the effect. If it
works well, I will share it with the community.

On Fri, Feb 10, 2023 at 4:26 PM Luca Canali  wrote:

> Hi Qian,
>
>
>
> Indeed the metrics available with the Prometheus servlet sink (which is
> marked still as experimental) are limited, compared to the full
> instrumentation, and this is due to the way it is implemented with a
> servlet and cannot be easily extended from what I can see.
>
> You can use another supported metrics sink (see
> https://spark.apache.org/docs/latest/monitoring.html#metrics ) if you
> want to collect all the metrics are exposed by Spark executors.
>
> For example, I use the graphite sink and then collect metrics into an
> InfluxDB instance (see https://github.com/cerndb/spark-dashboard )
>
> An additional comment is that there is room for having more sinks
> available for Apache Spark metrics, notably for InfluxDB and for Prometheus
> (gateway), if someone is interested in working on that.
>
>
>
> Best,
>
> Luca
>
>
>
>
>
> *From:* Qian Sun 
> *Sent:* Friday, February 10, 2023 05:05
> *To:* dev ; user.spark 
> *Subject:* Executor metrics are missing on prometheus sink
>
>
>
> Setting up prometheus sink in this way:
>
> -c spark.ui.prometheus.enabled=true
>
> -c spark.executor.processTreeMetrics.enabled=true
>
> -c spark.metrics.conf=/spark/conf/metric.properties
>
> *metric.properties:*{}
>
> *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
>
> *.sink.prometheusServlet.path=/metrics/prometheus
>
> Result:
>
> Both of these endpoints have some metrics
>
> :4040/metrics/prometheus
>
> :4040/metrics/executors/prometheus
>
>
>
> But the executor one misses metrics under the executor namespace
> described here:
>
>
> https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor
>
>
>
> *How to expose executor metrics on spark exeuctors pod?*
>
>
>
> *Any help will be appreciated.*
>
> --
>
> Regards,
>
> Qian Sun
>


-- 
Regards,
Qian Sun


Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread L. C. Hsieh
+1

On Mon, Feb 13, 2023 at 3:49 PM Mich Talebzadeh 
wrote:

> +1 for me
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Feb 2023 at 23:18, huaxin gao  wrote:
>
>> +1
>>
>> On Mon, Feb 13, 2023 at 3:09 PM Dongjoon Hyun 
>> wrote:
>>
>>> +1
>>>
>>> Dongjoon
>>>
>>> On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
>>> > Hi all,
>>> >
>>> > I'd like to start the vote for SPIP: Lazy Materialization for Parquet
>>> > Read Performance Improvement.
>>> >
>>> > The high summary of the SPIP is that it proposes an improvement to the
>>> > Parquet reader with lazy materialization which only materializes (i.e.
>>> > decompress, de-code, etc...) necessary values. For Spark-SQL filter
>>> > operations, evaluating the filters first and lazily materializing only
>>> > the used values can save computation wastes and improve the read
>>> > performance.
>>> >
>>> > References:
>>> >
>>> > JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
>>> > SPIP doc
>>> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
>>> > Discussion thread
>>> > https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
>>> >
>>> > Please vote on the SPIP for the next 72 hours:
>>> >
>>> > [ ] +1: Accept the proposal as an official SPIP
>>> > [ ] +0
>>> > [ ] -1: I don’t think this is a good idea because …
>>> >
>>> > Thank you!
>>> >
>>> > Liang-Chi Hsieh
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-13 Thread Mich Talebzadeh
Hi All,

First thanks to Holden for organising this open discussion and exchange of
ideas. I must apologize for problems with my microphone. Hopefully it
should not happen again..

>From my own commercial experience  with k8s, mainly Google GKE, there is a
main concern that Spark on GKE is work in progress and not on-par with
Spark on hadoop/yarn example Spark on Google Dataproc. I don't think this
statement is longer true as Spark on K8s has since matured. Albeit the
performance is not 100% there. The commercial motivation for Spark on K9s
is reduction in cost. The assumption is that  it would be cheaper to run
Spark on GKE without Dataproc. Not to forget that there are other non-spark
applications and datatores running on K8s/GKE. So it makes sense to improve
Spark performance on K8s. Another motivation is to break down monolithic
applications into microservices and from the point of ETL/ELT Spark plays a
considerable role.

For those who are still using Spark on Hadoop/yarn, if I recall Colin
mentioned, Google has thought that Spark is important enough to allow the
migration path from Spark on dataproc to  Run a Spark job on Dataproc on
Google Kubernetes Engine. I am not sure other Cloud vendors have been
through this journey. Maybe some members will clarify this.


With regard to authentication there is this workload identity that has
replaced the clumsy secrets file that compromised security and was
available to all nodes of K8s cluster. I am not sure how Spark can
integrate with workload identity. The authentication is at the pod level
rather than node level.




Thanks


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 13 Feb 2023 at 08:24, Holden Karau  wrote:

> Some general issues we found common ground around:
>
> Inter-Pod security, istio + mTLS
> Sidecar management
> Docker Images
> Add links to more related images
> - Helm links
> Data Locality concerns
> Upgrading  Spark Versions
> Performance issues
>
> Thanks to everyone who was able to make the informal coffee chat
>
> I'll try and schedule another one at a more European friendly time so that
> we can all get to chat as well.
>
> On Fri, Feb 10, 2023 at 1:08 PM Mich Talebzadeh 
> wrote:
>
>> Great looking forward to it
>>
>> Mich
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 10 Feb 2023 at 18:58, Holden Karau  wrote:
>>
>>> Ok so the first iteration of this is booked:
>>>
>>>
>>> Spark on Kube Coffee Chats
>>> Sunday, Feb 12 · 6–7 PM pacific time
>>> Google Meet joining info
>>> Video call link: https://meet.google.com/wge-tzzd-uyj
>>>
>>> Assuming that all goes well I’ll send out another doodle pole after this
>>> one for the folks who could not make this one.
>>>
>>> Looking forward to catching up with y’all :) No prep work necessary but
>>> if anyone wants to write down a brief like two sentence blurb about their
>>> goals for Spark on Kube was thinking we might go around the virtual room
>>> sharing that as our kicking off point for this coffee meeting :)
>>>
>>>
>>> On Wed, Feb 8, 2023 at 12:27 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 That sounds like a good plan Holden!


 Let us go for it


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 8 Feb 2023 at 20:12, Holden Karau  

Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Mich Talebzadeh
+1 for me



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 13 Feb 2023 at 23:18, huaxin gao  wrote:

> +1
>
> On Mon, Feb 13, 2023 at 3:09 PM Dongjoon Hyun  wrote:
>
>> +1
>>
>> Dongjoon
>>
>> On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
>> > Hi all,
>> >
>> > I'd like to start the vote for SPIP: Lazy Materialization for Parquet
>> > Read Performance Improvement.
>> >
>> > The high summary of the SPIP is that it proposes an improvement to the
>> > Parquet reader with lazy materialization which only materializes (i.e.
>> > decompress, de-code, etc...) necessary values. For Spark-SQL filter
>> > operations, evaluating the filters first and lazily materializing only
>> > the used values can save computation wastes and improve the read
>> > performance.
>> >
>> > References:
>> >
>> > JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
>> > SPIP doc
>> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
>> > Discussion thread
>> > https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
>> >
>> > Please vote on the SPIP for the next 72 hours:
>> >
>> > [ ] +1: Accept the proposal as an official SPIP
>> > [ ] +0
>> > [ ] -1: I don’t think this is a good idea because …
>> >
>> > Thank you!
>> >
>> > Liang-Chi Hsieh
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread huaxin gao
+1

On Mon, Feb 13, 2023 at 3:09 PM Dongjoon Hyun  wrote:

> +1
>
> Dongjoon
>
> On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
> > Hi all,
> >
> > I'd like to start the vote for SPIP: Lazy Materialization for Parquet
> > Read Performance Improvement.
> >
> > The high summary of the SPIP is that it proposes an improvement to the
> > Parquet reader with lazy materialization which only materializes (i.e.
> > decompress, de-code, etc...) necessary values. For Spark-SQL filter
> > operations, evaluating the filters first and lazily materializing only
> > the used values can save computation wastes and improve the read
> > performance.
> >
> > References:
> >
> > JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
> > SPIP doc
> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
> > Discussion thread
> > https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you!
> >
> > Liang-Chi Hsieh
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Dongjoon Hyun
+1

Dongjoon

On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
> Hi all,
> 
> I'd like to start the vote for SPIP: Lazy Materialization for Parquet
> Read Performance Improvement.
> 
> The high summary of the SPIP is that it proposes an improvement to the
> Parquet reader with lazy materialization which only materializes (i.e.
> decompress, de-code, etc...) necessary values. For Spark-SQL filter
> operations, evaluating the filters first and lazily materializing only
> the used values can save computation wastes and improve the read
> performance.
> 
> References:
> 
> JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
> SPIP doc 
> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
> Discussion thread
> https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
> 
> Please vote on the SPIP for the next 72 hours:
> 
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
> 
> Thank you!
> 
> Liang-Chi Hsieh
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread L. C. Hsieh
Hi all,

I'd like to start the vote for SPIP: Lazy Materialization for Parquet
Read Performance Improvement.

The high summary of the SPIP is that it proposes an improvement to the
Parquet reader with lazy materialization which only materializes (i.e.
decompress, de-code, etc...) necessary values. For Spark-SQL filter
operations, evaluating the filters first and lazily materializing only
the used values can save computation wastes and improve the read
performance.

References:

JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
SPIP doc 
https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
Discussion thread
https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thank you!

Liang-Chi Hsieh

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread L. C. Hsieh
Hi Mich,

The title of this thread is "[DISCUSS]". We need to have a public
discussion on a SPIP proposal collecting comments before we can move
forward to call for a vote on it.


On Mon, Feb 13, 2023 at 2:35 PM Mich Talebzadeh 
wrote:

> Hi,
>
> I thought we already voted to go ahead with this proposal!
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Feb 2023 at 20:41, kazuyuki tanimura 
> wrote:
>
>> Thank you Liang-Chi!
>>
>> Kazu
>>
>> On Feb 11, 2023, at 7:12 PM, L. C. Hsieh  wrote:
>>
>> Thanks all for your feedback.
>>
>> Given this positive feedback, if there is no other comments/discussion, I
>> will go to start a vote in the next few days.
>>
>> Thank you again!
>>
>> On Thu, Feb 2, 2023 at 10:12 AM kazuyuki tanimura <
>> ktanim...@apple.com.invalid> wrote:
>>
>>> Thank you all for +1s and reviewing the SPIP doc.
>>>
>>> Kazu
>>>
>>> On Feb 1, 2023, at 1:28 AM, Dongjoon Hyun 
>>> wrote:
>>>
>>> +1
>>>
>>> On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 +1


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 1 Feb 2023 at 02:23, huaxin gao  wrote:

> +1
>
> On Tue, Jan 31, 2023 at 6:10 PM DB Tsai  wrote:
>
>> +1
>>
>> Sent from my iPhone
>>
>> On Jan 31, 2023, at 4:16 PM, Yuming Wang  wrote:
>>
>> 
>> +1.
>>
>> On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura <
>> ktanim...@apple.com.invalid> wrote:
>>
>>> Great! Much appreciated, Mitch!
>>>
>>> Kazu
>>>
>>> On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>> Thanks, Kazu.
>>>
>>> I followed that template link and indeed as you pointed out it is a
>>> common template. If it works then it is what it is.
>>>
>>> I will be going through your design proposals and hopefully we can
>>> review it.
>>>
>>> Regards,
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>> for any loss, damage or destruction of data or any other property which 
>>> may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary 
>>> damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 31 Jan 2023 at 22:34, kazuyuki tanimura 
>>> wrote:
>>>
 Thank you Mich. I followed the instruction at
 https://spark.apache.org/improvement-proposals.html and used its
 template.
 While we are open to revise our design doc, it seems more like you
 are proposing the community to change the instruction per se?

 Kazu

 On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

 Hi,

 Thanks for these proposals. good suggestions. Is this style of
 breaking down your approach standard?

 My view would be that perhaps it makes more sense to follow the
 industry established approach of breaking down
 your technical proposal  into:


1. Background
2. Objective
3. Scope
4. Constraints
5. Assumptions
6. Reporting
7. Deliverables
8. Timelines
9. Appendix

 Your current approach using below

 Q1. What are you trying to do? Articulate your objectives using
 absolutely no jargon. What are you trying to achieve?
 Q2. What problem is this proposal NOT designed to solve? What
 issues the suggested proposal is not going to address
 Q3. How is it done today, and 

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Mich Talebzadeh
Hi,

I thought we already voted to go ahead with this proposal!



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 13 Feb 2023 at 20:41, kazuyuki tanimura  wrote:

> Thank you Liang-Chi!
>
> Kazu
>
> On Feb 11, 2023, at 7:12 PM, L. C. Hsieh  wrote:
>
> Thanks all for your feedback.
>
> Given this positive feedback, if there is no other comments/discussion, I
> will go to start a vote in the next few days.
>
> Thank you again!
>
> On Thu, Feb 2, 2023 at 10:12 AM kazuyuki tanimura <
> ktanim...@apple.com.invalid> wrote:
>
>> Thank you all for +1s and reviewing the SPIP doc.
>>
>> Kazu
>>
>> On Feb 1, 2023, at 1:28 AM, Dongjoon Hyun 
>> wrote:
>>
>> +1
>>
>> On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> +1
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 1 Feb 2023 at 02:23, huaxin gao  wrote:
>>>
 +1

 On Tue, Jan 31, 2023 at 6:10 PM DB Tsai  wrote:

> +1
>
> Sent from my iPhone
>
> On Jan 31, 2023, at 4:16 PM, Yuming Wang  wrote:
>
> 
> +1.
>
> On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura <
> ktanim...@apple.com.invalid> wrote:
>
>> Great! Much appreciated, Mitch!
>>
>> Kazu
>>
>> On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> Thanks, Kazu.
>>
>> I followed that template link and indeed as you pointed out it is a
>> common template. If it works then it is what it is.
>>
>> I will be going through your design proposals and hopefully we can
>> review it.
>>
>> Regards,
>>
>> Mich
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or any other property which 
>> may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 31 Jan 2023 at 22:34, kazuyuki tanimura 
>> wrote:
>>
>>> Thank you Mich. I followed the instruction at
>>> https://spark.apache.org/improvement-proposals.html and used its
>>> template.
>>> While we are open to revise our design doc, it seems more like you
>>> are proposing the community to change the instruction per se?
>>>
>>> Kazu
>>>
>>> On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for these proposals. good suggestions. Is this style of
>>> breaking down your approach standard?
>>>
>>> My view would be that perhaps it makes more sense to follow the
>>> industry established approach of breaking down
>>> your technical proposal  into:
>>>
>>>
>>>1. Background
>>>2. Objective
>>>3. Scope
>>>4. Constraints
>>>5. Assumptions
>>>6. Reporting
>>>7. Deliverables
>>>8. Timelines
>>>9. Appendix
>>>
>>> Your current approach using below
>>>
>>> Q1. What are you trying to do? Articulate your objectives using
>>> absolutely no jargon. What are you trying to achieve?
>>> Q2. What problem is this proposal NOT designed to solve? What
>>> issues the suggested proposal is not going to address
>>> Q3. How is it done today, and what are the limits of current
>>> practice?
>>> Q4. What is new in your approach approach and why do you think it
>>> will be successful succeed?
>>> Q5. Who cares? If you are successful, what difference will it make?
>>> If your proposal succeeds, what tangible benefits will it add?
>>> Q6. What are the risks?
>>> Q7. How long will it take?
>>> Q8. What are the midterm and final “exams” to 

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread kazuyuki tanimura
Thank you Liang-Chi!

Kazu

> On Feb 11, 2023, at 7:12 PM, L. C. Hsieh  wrote:
> 
> Thanks all for your feedback.
> 
> Given this positive feedback, if there is no other comments/discussion, I 
> will go to start a vote in the next few days.
> 
> Thank you again!
> 
> On Thu, Feb 2, 2023 at 10:12 AM kazuyuki tanimura 
>  wrote:
> Thank you all for +1s and reviewing the SPIP doc.
> 
> Kazu
> 
>> On Feb 1, 2023, at 1:28 AM, Dongjoon Hyun > > wrote:
>> 
>> +1
>> 
>> On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh > > wrote:
>> +1
>> 
>> 
>>view my Linkedin profile 
>> 
>> 
>>  https://en.everybodywiki.com/Mich_Talebzadeh 
>> 
>>  
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>> On Wed, 1 Feb 2023 at 02:23, huaxin gao > > wrote:
>> +1
>> 
>> On Tue, Jan 31, 2023 at 6:10 PM DB Tsai > > wrote:
>> +1
>> 
>> Sent from my iPhone
>> 
>>> On Jan 31, 2023, at 4:16 PM, Yuming Wang >> > wrote:
>>> 
>>> 
>>> +1.
>>> 
>>> On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura 
>>> mailto:ktanim...@apple.com.invalid>> wrote:
>>> Great! Much appreciated, Mitch!
>>> 
>>> Kazu
>>> 
 On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh >>> > wrote:
 
 Thanks, Kazu.
 
 I followed that template link and indeed as you pointed out it is a common 
 template. If it works then it is what it is.
 
 I will be going through your design proposals and hopefully we can review 
 it.
 
 Regards,
 
 Mich
 
 
view my Linkedin profile 
 
 
  https://en.everybodywiki.com/Mich_Talebzadeh 
 
  
 Disclaimer: Use it at your own risk. Any and all responsibility for any 
 loss, damage or destruction of data or any other property which may arise 
 from relying on this email's technical content is explicitly disclaimed. 
 The author will in no case be liable for any monetary damages arising from 
 such loss, damage or destruction.
  
 
 
 On Tue, 31 Jan 2023 at 22:34, kazuyuki tanimura >>> > wrote:
 Thank you Mich. I followed the instruction at 
 https://spark.apache.org/improvement-proposals.html 
  and used its 
 template.
 While we are open to revise our design doc, it seems more like you are 
 proposing the community to change the instruction per se?
 
 Kazu
 
> On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh  > wrote:
> 
> Hi,
> 
> Thanks for these proposals. good suggestions. Is this style of breaking 
> down your approach standard?
> 
> My view would be that perhaps it makes more sense to follow the industry 
> established approach of breaking down your technical proposal  into:
> 
> Background
> Objective
> Scope
> Constraints
> Assumptions
> Reporting
> Deliverables
> Timelines
> Appendix
> Your current approach using below 
> 
> Q1. What are you trying to do? Articulate your objectives using 
> absolutely no jargon. What are you trying to achieve?
> Q2. What problem is this proposal NOT designed to solve? What issues the 
> suggested proposal is not going to address
> Q3. How is it done today, and what are the limits of current practice?
> Q4. What is new in your approach approach and why do you think it will be 
> successful succeed?
> Q5. Who cares? If you are successful, what difference will it make? If 
> your proposal succeeds, what tangible benefits will it add?
> Q6. What are the risks?
> Q7. How long will it take?
> Q8. What are the midterm and final “exams” to check for success?
>  
> May not do  justice to your proposal.
> 
> HTH
> 
> Mich
> 
> 
>view my Linkedin profile 
> 
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh 
> 
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any 
> loss, damage or destruction of data or any other property which may arise 
> from relying on this email's technical content is explicitly disclaimed. 
> The author 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
That’s legit, if the patch author isn’t comfortable with a backport then
let’s leave it be 

On Mon, Feb 13, 2023 at 9:59 AM Dongjoon Hyun 
wrote:

> Hi, All.
>
> As the author of that `Improvement` patch, I strongly disagree with giving
> the wrong idea which Python 3.11 is officially supported in Spark 3.3.
>
> I only developed and delivered it for Apache Spark 3.4.0 specifically as
> `Improvement`.
>
> We may want to backport it branch-3.3 but it's also another discussion
> topic because it's `Improvement` instead of a blocker of any existing
> release branch.
>
> Please raise the backporting discussion thread after 3.3.2 releasing if
> you want it in branch-3.3.
>
> We need to talk. :)
>
> Bests,
> Dongjoon.
>
>
> On Mon, Feb 13, 2023 at 9:31 AM Chao Sun  wrote:
>
>> +1
>>
>> On Mon, Feb 13, 2023 at 9:20 AM L. C. Hsieh  wrote:
>> >
>> > If it is not supported in Spark 3.3.x, it looks like an improvement at
>> > Spark 3.4.
>> > For such cases we usually do not back port. I think this is also why
>> > the PR did not back port when it was merged.
>> >
>> > I'm okay if there is consensus to back port it.
>> >
>> > On Mon, Feb 13, 2023 at 9:08 AM Sean Owen  wrote:
>> > >
>> > > Does that change change the result for Spark 3.3.x?
>> > > It looks like we do not support Python 3.11 in Spark 3.3.x, which is
>> one answer to whether this should be changed now.
>> > > But if that's the only change that matters for Python 3.11 and makes
>> it work, sure I think we should back-port. It doesn't necessarily block a
>> release but if that's the case, it seems OK to include to me in a next RC.
>> > >
>> > > On Mon, Feb 13, 2023 at 10:53 AM Bjørn Jørgensen <
>> bjornjorgen...@gmail.com> wrote:
>> > >>
>> > >> There is a fix for python 3.11
>> https://github.com/apache/spark/pull/38987
>> > >> We should have this in more branches.
>> > >>
>> > >> man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen <
>> bjornjorgen...@gmail.com>:
>> > >>>
>> > >>> On manjaro it is Python 3.10.9
>> > >>>
>> > >>> On ubuntu it is Python 3.11.1
>> > >>>
>> > >>> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :
>> > 
>> >  Which Python version do you use for testing? When I use the latest
>> Python 3.11, I can reproduce similar test failures (43 tests of sql module
>> fail), but when I use python 3.10, they will succeed
>> > 
>> > 
>> > 
>> >  YangJie
>> > 
>> > 
>> > 
>> >  发件人: Bjørn Jørgensen 
>> >  日期: 2023年2月13日 星期一 05:09
>> >  收件人: Sean Owen 
>> >  抄送: "L. C. Hsieh" , Spark dev list <
>> dev@spark.apache.org>
>> >  主题: Re: [VOTE] Release Spark 3.3.2 (RC1)
>> > 
>> > 
>> > 
>> >  Tried it one more time and the same result.
>> > 
>> > 
>> > 
>> >  On another box with Manjaro
>> > 
>> > 
>> 
>> >  [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
>> >  [INFO]
>> >  [INFO] Spark Project Parent POM ...
>> SUCCESS [01:50 min]
>> >  [INFO] Spark Project Tags .
>> SUCCESS [ 17.359 s]
>> >  [INFO] Spark Project Sketch ...
>> SUCCESS [ 12.517 s]
>> >  [INFO] Spark Project Local DB .
>> SUCCESS [ 14.463 s]
>> >  [INFO] Spark Project Networking ...
>> SUCCESS [01:07 min]
>> >  [INFO] Spark Project Shuffle Streaming Service 
>> SUCCESS [  9.013 s]
>> >  [INFO] Spark Project Unsafe ...
>> SUCCESS [  8.184 s]
>> >  [INFO] Spark Project Launcher .
>> SUCCESS [ 10.454 s]
>> >  [INFO] Spark Project Core .
>> SUCCESS [23:58 min]
>> >  [INFO] Spark Project ML Local Library .
>> SUCCESS [ 21.218 s]
>> >  [INFO] Spark Project GraphX ...
>> SUCCESS [01:24 min]
>> >  [INFO] Spark Project Streaming 
>> SUCCESS [04:57 min]
>> >  [INFO] Spark Project Catalyst .
>> SUCCESS [08:00 min]
>> >  [INFO] Spark Project SQL ..
>> SUCCESS [  01:02 h]
>> >  [INFO] Spark Project ML Library ...
>> SUCCESS [14:38 min]
>> >  [INFO] Spark Project Tools 
>> SUCCESS [  4.394 s]
>> >  [INFO] Spark Project Hive .
>> SUCCESS [53:43 min]
>> >  [INFO] Spark Project REPL .
>> SUCCESS [01:16 min]
>> >  [INFO] Spark Project Assembly .
>> SUCCESS [  2.186 s]
>> >  [INFO] Kafka 0.10+ Token Provider for Streaming ...
>> SUCCESS [ 16.150 s]
>> >  [INFO] Spark Integration for Kafka 0.10 ...
>> SUCCESS [01:34 min]
>> >  [INFO] Kafka 0.10+ Source for Structured Streaming 
>> SUCCESS 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Dongjoon Hyun
Hi, All.

As the author of that `Improvement` patch, I strongly disagree with giving
the wrong idea which Python 3.11 is officially supported in Spark 3.3.

I only developed and delivered it for Apache Spark 3.4.0 specifically as
`Improvement`.

We may want to backport it branch-3.3 but it's also another discussion
topic because it's `Improvement` instead of a blocker of any existing
release branch.

Please raise the backporting discussion thread after 3.3.2 releasing if you
want it in branch-3.3.

We need to talk. :)

Bests,
Dongjoon.


On Mon, Feb 13, 2023 at 9:31 AM Chao Sun  wrote:

> +1
>
> On Mon, Feb 13, 2023 at 9:20 AM L. C. Hsieh  wrote:
> >
> > If it is not supported in Spark 3.3.x, it looks like an improvement at
> > Spark 3.4.
> > For such cases we usually do not back port. I think this is also why
> > the PR did not back port when it was merged.
> >
> > I'm okay if there is consensus to back port it.
> >
> > On Mon, Feb 13, 2023 at 9:08 AM Sean Owen  wrote:
> > >
> > > Does that change change the result for Spark 3.3.x?
> > > It looks like we do not support Python 3.11 in Spark 3.3.x, which is
> one answer to whether this should be changed now.
> > > But if that's the only change that matters for Python 3.11 and makes
> it work, sure I think we should back-port. It doesn't necessarily block a
> release but if that's the case, it seems OK to include to me in a next RC.
> > >
> > > On Mon, Feb 13, 2023 at 10:53 AM Bjørn Jørgensen <
> bjornjorgen...@gmail.com> wrote:
> > >>
> > >> There is a fix for python 3.11
> https://github.com/apache/spark/pull/38987
> > >> We should have this in more branches.
> > >>
> > >> man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen <
> bjornjorgen...@gmail.com>:
> > >>>
> > >>> On manjaro it is Python 3.10.9
> > >>>
> > >>> On ubuntu it is Python 3.11.1
> > >>>
> > >>> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :
> > 
> >  Which Python version do you use for testing? When I use the latest
> Python 3.11, I can reproduce similar test failures (43 tests of sql module
> fail), but when I use python 3.10, they will succeed
> > 
> > 
> > 
> >  YangJie
> > 
> > 
> > 
> >  发件人: Bjørn Jørgensen 
> >  日期: 2023年2月13日 星期一 05:09
> >  收件人: Sean Owen 
> >  抄送: "L. C. Hsieh" , Spark dev list <
> dev@spark.apache.org>
> >  主题: Re: [VOTE] Release Spark 3.3.2 (RC1)
> > 
> > 
> > 
> >  Tried it one more time and the same result.
> > 
> > 
> > 
> >  On another box with Manjaro
> > 
> > 
> 
> >  [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
> >  [INFO]
> >  [INFO] Spark Project Parent POM ... SUCCESS
> [01:50 min]
> >  [INFO] Spark Project Tags . SUCCESS
> [ 17.359 s]
> >  [INFO] Spark Project Sketch ... SUCCESS
> [ 12.517 s]
> >  [INFO] Spark Project Local DB . SUCCESS
> [ 14.463 s]
> >  [INFO] Spark Project Networking ... SUCCESS
> [01:07 min]
> >  [INFO] Spark Project Shuffle Streaming Service  SUCCESS
> [  9.013 s]
> >  [INFO] Spark Project Unsafe ... SUCCESS
> [  8.184 s]
> >  [INFO] Spark Project Launcher . SUCCESS
> [ 10.454 s]
> >  [INFO] Spark Project Core . SUCCESS
> [23:58 min]
> >  [INFO] Spark Project ML Local Library . SUCCESS
> [ 21.218 s]
> >  [INFO] Spark Project GraphX ... SUCCESS
> [01:24 min]
> >  [INFO] Spark Project Streaming  SUCCESS
> [04:57 min]
> >  [INFO] Spark Project Catalyst . SUCCESS
> [08:00 min]
> >  [INFO] Spark Project SQL .. SUCCESS
> [  01:02 h]
> >  [INFO] Spark Project ML Library ... SUCCESS
> [14:38 min]
> >  [INFO] Spark Project Tools  SUCCESS
> [  4.394 s]
> >  [INFO] Spark Project Hive . SUCCESS
> [53:43 min]
> >  [INFO] Spark Project REPL . SUCCESS
> [01:16 min]
> >  [INFO] Spark Project Assembly . SUCCESS
> [  2.186 s]
> >  [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS
> [ 16.150 s]
> >  [INFO] Spark Integration for Kafka 0.10 ... SUCCESS
> [01:34 min]
> >  [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS
> [32:55 min]
> >  [INFO] Spark Project Examples . SUCCESS
> [ 23.800 s]
> >  [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS
> [  7.301 s]
> >  [INFO] Spark Avro . SUCCESS
> [01:19 min]
> >  [INFO]

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Chao Sun
+1

On Mon, Feb 13, 2023 at 9:20 AM L. C. Hsieh  wrote:
>
> If it is not supported in Spark 3.3.x, it looks like an improvement at
> Spark 3.4.
> For such cases we usually do not back port. I think this is also why
> the PR did not back port when it was merged.
>
> I'm okay if there is consensus to back port it.
>
> On Mon, Feb 13, 2023 at 9:08 AM Sean Owen  wrote:
> >
> > Does that change change the result for Spark 3.3.x?
> > It looks like we do not support Python 3.11 in Spark 3.3.x, which is one 
> > answer to whether this should be changed now.
> > But if that's the only change that matters for Python 3.11 and makes it 
> > work, sure I think we should back-port. It doesn't necessarily block a 
> > release but if that's the case, it seems OK to include to me in a next RC.
> >
> > On Mon, Feb 13, 2023 at 10:53 AM Bjørn Jørgensen  
> > wrote:
> >>
> >> There is a fix for python 3.11 https://github.com/apache/spark/pull/38987
> >> We should have this in more branches.
> >>
> >> man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen 
> >> :
> >>>
> >>> On manjaro it is Python 3.10.9
> >>>
> >>> On ubuntu it is Python 3.11.1
> >>>
> >>> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :
> 
>  Which Python version do you use for testing? When I use the latest 
>  Python 3.11, I can reproduce similar test failures (43 tests of sql 
>  module fail), but when I use python 3.10, they will succeed
> 
> 
> 
>  YangJie
> 
> 
> 
>  发件人: Bjørn Jørgensen 
>  日期: 2023年2月13日 星期一 05:09
>  收件人: Sean Owen 
>  抄送: "L. C. Hsieh" , Spark dev list 
>  
>  主题: Re: [VOTE] Release Spark 3.3.2 (RC1)
> 
> 
> 
>  Tried it one more time and the same result.
> 
> 
> 
>  On another box with Manjaro
> 
>  
>  [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
>  [INFO]
>  [INFO] Spark Project Parent POM ... SUCCESS 
>  [01:50 min]
>  [INFO] Spark Project Tags . SUCCESS [ 
>  17.359 s]
>  [INFO] Spark Project Sketch ... SUCCESS [ 
>  12.517 s]
>  [INFO] Spark Project Local DB . SUCCESS [ 
>  14.463 s]
>  [INFO] Spark Project Networking ... SUCCESS 
>  [01:07 min]
>  [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  
>  9.013 s]
>  [INFO] Spark Project Unsafe ... SUCCESS [  
>  8.184 s]
>  [INFO] Spark Project Launcher . SUCCESS [ 
>  10.454 s]
>  [INFO] Spark Project Core . SUCCESS 
>  [23:58 min]
>  [INFO] Spark Project ML Local Library . SUCCESS [ 
>  21.218 s]
>  [INFO] Spark Project GraphX ... SUCCESS 
>  [01:24 min]
>  [INFO] Spark Project Streaming  SUCCESS 
>  [04:57 min]
>  [INFO] Spark Project Catalyst . SUCCESS 
>  [08:00 min]
>  [INFO] Spark Project SQL .. SUCCESS [  
>  01:02 h]
>  [INFO] Spark Project ML Library ... SUCCESS 
>  [14:38 min]
>  [INFO] Spark Project Tools  SUCCESS [  
>  4.394 s]
>  [INFO] Spark Project Hive . SUCCESS 
>  [53:43 min]
>  [INFO] Spark Project REPL . SUCCESS 
>  [01:16 min]
>  [INFO] Spark Project Assembly . SUCCESS [  
>  2.186 s]
>  [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ 
>  16.150 s]
>  [INFO] Spark Integration for Kafka 0.10 ... SUCCESS 
>  [01:34 min]
>  [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS 
>  [32:55 min]
>  [INFO] Spark Project Examples . SUCCESS [ 
>  23.800 s]
>  [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [  
>  7.301 s]
>  [INFO] Spark Avro . SUCCESS 
>  [01:19 min]
>  [INFO] 
>  
>  [INFO] BUILD SUCCESS
>  [INFO] 
>  
>  [INFO] Total time:  03:31 h
>  [INFO] Finished at: 2023-02-12T21:54:20+01:00
>  [INFO] 
>  
>  [bjorn@amd7g spark-3.3.2]$  java -version
>  openjdk version "17.0.6" 2023-01-17
>  OpenJDK Runtime Environment (build 17.0.6+10)
>  OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)
> 
> 
> 
> 
> 
>  :)
> 
> 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Sean Owen
Agree, just, if it's such a tiny change, and it actually fixes the issue,
maybe worth getting that into 3.3.x. I don't feel strongly.

On Mon, Feb 13, 2023 at 11:19 AM L. C. Hsieh  wrote:

> If it is not supported in Spark 3.3.x, it looks like an improvement at
> Spark 3.4.
> For such cases we usually do not back port. I think this is also why
> the PR did not back port when it was merged.
>
> I'm okay if there is consensus to back port it.
>
>


Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
I’d be in favor of a back porting with the idea its a bug fix for a
language (admittedly not a version we’ve supported before)

On Mon, Feb 13, 2023 at 9:19 AM L. C. Hsieh  wrote:

> If it is not supported in Spark 3.3.x, it looks like an improvement at
> Spark 3.4.
> For such cases we usually do not back port. I think this is also why
> the PR did not back port when it was merged.
>
> I'm okay if there is consensus to back port it.
>
> On Mon, Feb 13, 2023 at 9:08 AM Sean Owen  wrote:
> >
> > Does that change change the result for Spark 3.3.x?
> > It looks like we do not support Python 3.11 in Spark 3.3.x, which is one
> answer to whether this should be changed now.
> > But if that's the only change that matters for Python 3.11 and makes it
> work, sure I think we should back-port. It doesn't necessarily block a
> release but if that's the case, it seems OK to include to me in a next RC.
> >
> > On Mon, Feb 13, 2023 at 10:53 AM Bjørn Jørgensen <
> bjornjorgen...@gmail.com> wrote:
> >>
> >> There is a fix for python 3.11
> https://github.com/apache/spark/pull/38987
> >> We should have this in more branches.
> >>
> >> man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen <
> bjornjorgen...@gmail.com>:
> >>>
> >>> On manjaro it is Python 3.10.9
> >>>
> >>> On ubuntu it is Python 3.11.1
> >>>
> >>> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :
> 
>  Which Python version do you use for testing? When I use the latest
> Python 3.11, I can reproduce similar test failures (43 tests of sql module
> fail), but when I use python 3.10, they will succeed
> 
> 
> 
>  YangJie
> 
> 
> 
>  发件人: Bjørn Jørgensen 
>  日期: 2023年2月13日 星期一 05:09
>  收件人: Sean Owen 
>  抄送: "L. C. Hsieh" , Spark dev list <
> dev@spark.apache.org>
>  主题: Re: [VOTE] Release Spark 3.3.2 (RC1)
> 
> 
> 
>  Tried it one more time and the same result.
> 
> 
> 
>  On another box with Manjaro
> 
> 
> 
>  [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
>  [INFO]
>  [INFO] Spark Project Parent POM ... SUCCESS
> [01:50 min]
>  [INFO] Spark Project Tags . SUCCESS [
> 17.359 s]
>  [INFO] Spark Project Sketch ... SUCCESS [
> 12.517 s]
>  [INFO] Spark Project Local DB . SUCCESS [
> 14.463 s]
>  [INFO] Spark Project Networking ... SUCCESS
> [01:07 min]
>  [INFO] Spark Project Shuffle Streaming Service  SUCCESS
> [  9.013 s]
>  [INFO] Spark Project Unsafe ... SUCCESS
> [  8.184 s]
>  [INFO] Spark Project Launcher . SUCCESS [
> 10.454 s]
>  [INFO] Spark Project Core . SUCCESS
> [23:58 min]
>  [INFO] Spark Project ML Local Library . SUCCESS [
> 21.218 s]
>  [INFO] Spark Project GraphX ... SUCCESS
> [01:24 min]
>  [INFO] Spark Project Streaming  SUCCESS
> [04:57 min]
>  [INFO] Spark Project Catalyst . SUCCESS
> [08:00 min]
>  [INFO] Spark Project SQL .. SUCCESS
> [  01:02 h]
>  [INFO] Spark Project ML Library ... SUCCESS
> [14:38 min]
>  [INFO] Spark Project Tools  SUCCESS
> [  4.394 s]
>  [INFO] Spark Project Hive . SUCCESS
> [53:43 min]
>  [INFO] Spark Project REPL . SUCCESS
> [01:16 min]
>  [INFO] Spark Project Assembly . SUCCESS
> [  2.186 s]
>  [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [
> 16.150 s]
>  [INFO] Spark Integration for Kafka 0.10 ... SUCCESS
> [01:34 min]
>  [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS
> [32:55 min]
>  [INFO] Spark Project Examples . SUCCESS [
> 23.800 s]
>  [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS
> [  7.301 s]
>  [INFO] Spark Avro . SUCCESS
> [01:19 min]
>  [INFO]
> 
>  [INFO] BUILD SUCCESS
>  [INFO]
> 
>  [INFO] Total time:  03:31 h
>  [INFO] Finished at: 2023-02-12T21:54:20+01:00
>  [INFO]
> 
>  [bjorn@amd7g spark-3.3.2]$  java -version
>  openjdk version "17.0.6" 2023-01-17
>  OpenJDK Runtime Environment (build 17.0.6+10)
>  OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)
> 
> 
> 
> 
> 
>  :)
> 
> 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread L. C. Hsieh
If it is not supported in Spark 3.3.x, it looks like an improvement at
Spark 3.4.
For such cases we usually do not back port. I think this is also why
the PR did not back port when it was merged.

I'm okay if there is consensus to back port it.

On Mon, Feb 13, 2023 at 9:08 AM Sean Owen  wrote:
>
> Does that change change the result for Spark 3.3.x?
> It looks like we do not support Python 3.11 in Spark 3.3.x, which is one 
> answer to whether this should be changed now.
> But if that's the only change that matters for Python 3.11 and makes it work, 
> sure I think we should back-port. It doesn't necessarily block a release but 
> if that's the case, it seems OK to include to me in a next RC.
>
> On Mon, Feb 13, 2023 at 10:53 AM Bjørn Jørgensen  
> wrote:
>>
>> There is a fix for python 3.11 https://github.com/apache/spark/pull/38987
>> We should have this in more branches.
>>
>> man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen 
>> :
>>>
>>> On manjaro it is Python 3.10.9
>>>
>>> On ubuntu it is Python 3.11.1
>>>
>>> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :

 Which Python version do you use for testing? When I use the latest Python 
 3.11, I can reproduce similar test failures (43 tests of sql module fail), 
 but when I use python 3.10, they will succeed



 YangJie



 发件人: Bjørn Jørgensen 
 日期: 2023年2月13日 星期一 05:09
 收件人: Sean Owen 
 抄送: "L. C. Hsieh" , Spark dev list 
 主题: Re: [VOTE] Release Spark 3.3.2 (RC1)



 Tried it one more time and the same result.



 On another box with Manjaro

 
 [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
 [INFO]
 [INFO] Spark Project Parent POM ... SUCCESS [01:50 
 min]
 [INFO] Spark Project Tags . SUCCESS [ 
 17.359 s]
 [INFO] Spark Project Sketch ... SUCCESS [ 
 12.517 s]
 [INFO] Spark Project Local DB . SUCCESS [ 
 14.463 s]
 [INFO] Spark Project Networking ... SUCCESS [01:07 
 min]
 [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  
 9.013 s]
 [INFO] Spark Project Unsafe ... SUCCESS [  
 8.184 s]
 [INFO] Spark Project Launcher . SUCCESS [ 
 10.454 s]
 [INFO] Spark Project Core . SUCCESS [23:58 
 min]
 [INFO] Spark Project ML Local Library . SUCCESS [ 
 21.218 s]
 [INFO] Spark Project GraphX ... SUCCESS [01:24 
 min]
 [INFO] Spark Project Streaming  SUCCESS [04:57 
 min]
 [INFO] Spark Project Catalyst . SUCCESS [08:00 
 min]
 [INFO] Spark Project SQL .. SUCCESS [  
 01:02 h]
 [INFO] Spark Project ML Library ... SUCCESS [14:38 
 min]
 [INFO] Spark Project Tools  SUCCESS [  
 4.394 s]
 [INFO] Spark Project Hive . SUCCESS [53:43 
 min]
 [INFO] Spark Project REPL . SUCCESS [01:16 
 min]
 [INFO] Spark Project Assembly . SUCCESS [  
 2.186 s]
 [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ 
 16.150 s]
 [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:34 
 min]
 [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS [32:55 
 min]
 [INFO] Spark Project Examples . SUCCESS [ 
 23.800 s]
 [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [  
 7.301 s]
 [INFO] Spark Avro . SUCCESS [01:19 
 min]
 [INFO] 
 
 [INFO] BUILD SUCCESS
 [INFO] 
 
 [INFO] Total time:  03:31 h
 [INFO] Finished at: 2023-02-12T21:54:20+01:00
 [INFO] 
 
 [bjorn@amd7g spark-3.3.2]$  java -version
 openjdk version "17.0.6" 2023-01-17
 OpenJDK Runtime Environment (build 17.0.6+10)
 OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)





 :)



 So I'm +1





 søn. 12. feb. 2023 kl. 12:53 skrev Bjørn Jørgensen 
 :

 I use ubuntu rolling

 $ java -version
 openjdk version "17.0.6" 2023-01-17
 OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu1)
 OpenJDK 64-Bit Server VM (build 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Sean Owen
Does that change change the result for Spark 3.3.x?
It looks like we do not support Python 3.11 in Spark 3.3.x, which is one
answer to whether this should be changed now.
But if that's the only change that matters for Python 3.11 and makes it
work, sure I think we should back-port. It doesn't necessarily block a
release but if that's the case, it seems OK to include to me in a next RC.

On Mon, Feb 13, 2023 at 10:53 AM Bjørn Jørgensen 
wrote:

> There is a fix for python 3.11 https://github.com/apache/spark/pull/38987
> We should have this in more branches.
>
> man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen <
> bjornjorgen...@gmail.com>:
>
>> On manjaro it is Python 3.10.9
>>
>> On ubuntu it is Python 3.11.1
>>
>> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :
>>
>>> Which Python version do you use for testing? When I use the latest
>>> Python 3.11, I can reproduce similar test failures (43 tests of sql module
>>> fail), but when I use python 3.10, they will succeed
>>>
>>>
>>>
>>> YangJie
>>>
>>>
>>>
>>> *发件人**: *Bjørn Jørgensen 
>>> *日期**: *2023年2月13日 星期一 05:09
>>> *收件人**: *Sean Owen 
>>> *抄送**: *"L. C. Hsieh" , Spark dev list <
>>> dev@spark.apache.org>
>>> *主题**: *Re: [VOTE] Release Spark 3.3.2 (RC1)
>>>
>>>
>>>
>>> Tried it one more time and the same result.
>>>
>>>
>>>
>>> On another box with Manjaro
>>>
>>> 
>>> [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
>>> [INFO]
>>> [INFO] Spark Project Parent POM ... SUCCESS
>>> [01:50 min]
>>> [INFO] Spark Project Tags . SUCCESS [
>>> 17.359 s]
>>> [INFO] Spark Project Sketch ... SUCCESS [
>>> 12.517 s]
>>> [INFO] Spark Project Local DB . SUCCESS [
>>> 14.463 s]
>>> [INFO] Spark Project Networking ... SUCCESS
>>> [01:07 min]
>>> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
>>>  9.013 s]
>>> [INFO] Spark Project Unsafe ... SUCCESS [
>>>  8.184 s]
>>> [INFO] Spark Project Launcher . SUCCESS [
>>> 10.454 s]
>>> [INFO] Spark Project Core . SUCCESS
>>> [23:58 min]
>>> [INFO] Spark Project ML Local Library . SUCCESS [
>>> 21.218 s]
>>> [INFO] Spark Project GraphX ... SUCCESS
>>> [01:24 min]
>>> [INFO] Spark Project Streaming  SUCCESS
>>> [04:57 min]
>>> [INFO] Spark Project Catalyst . SUCCESS
>>> [08:00 min]
>>> [INFO] Spark Project SQL .. SUCCESS [
>>>  01:02 h]
>>> [INFO] Spark Project ML Library ... SUCCESS
>>> [14:38 min]
>>> [INFO] Spark Project Tools  SUCCESS [
>>>  4.394 s]
>>> [INFO] Spark Project Hive . SUCCESS
>>> [53:43 min]
>>> [INFO] Spark Project REPL . SUCCESS
>>> [01:16 min]
>>> [INFO] Spark Project Assembly . SUCCESS [
>>>  2.186 s]
>>> [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [
>>> 16.150 s]
>>> [INFO] Spark Integration for Kafka 0.10 ... SUCCESS
>>> [01:34 min]
>>> [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS
>>> [32:55 min]
>>> [INFO] Spark Project Examples . SUCCESS [
>>> 23.800 s]
>>> [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [
>>>  7.301 s]
>>> [INFO] Spark Avro . SUCCESS
>>> [01:19 min]
>>> [INFO]
>>> 
>>> [INFO] BUILD SUCCESS
>>> [INFO]
>>> 
>>> [INFO] Total time:  03:31 h
>>> [INFO] Finished at: 2023-02-12T21:54:20+01:00
>>> [INFO]
>>> 
>>> [bjorn@amd7g spark-3.3.2]$  java -version
>>> openjdk version "17.0.6" 2023-01-17
>>> OpenJDK Runtime Environment (build 17.0.6+10)
>>> OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)
>>>
>>>
>>>
>>>
>>>
>>> :)
>>>
>>>
>>>
>>> So I'm +1
>>>
>>>
>>>
>>>
>>>
>>> søn. 12. feb. 2023 kl. 12:53 skrev Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com>:
>>>
>>> I use ubuntu rolling
>>>
>>> $ java -version
>>> openjdk version "17.0.6" 2023-01-17
>>> OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu1)
>>> OpenJDK 64-Bit Server VM (build 17.0.6+10-Ubuntu-0ubuntu1, mixed mode,
>>> sharing)
>>>
>>>
>>>
>>> I have reboot now and restart ./build/mvn clean package
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> søn. 12. feb. 2023 kl. 04:47 skrev Sean Owen :
>>>
>>> +1 The tests and all results were the same as ever for me (Java 11,
>>> Scala 2.13, Ubuntu 22.04)
>>>
>>> I also didn't see that issue ... maybe somehow locale related? which
>>> 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Bjørn Jørgensen
There is a fix for python 3.11 https://github.com/apache/spark/pull/38987
We should have this in more branches.

man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen :

> On manjaro it is Python 3.10.9
>
> On ubuntu it is Python 3.11.1
>
> man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :
>
>> Which Python version do you use for testing? When I use the latest Python
>> 3.11, I can reproduce similar test failures (43 tests of sql module fail),
>> but when I use python 3.10, they will succeed
>>
>>
>>
>> YangJie
>>
>>
>>
>> *发件人**: *Bjørn Jørgensen 
>> *日期**: *2023年2月13日 星期一 05:09
>> *收件人**: *Sean Owen 
>> *抄送**: *"L. C. Hsieh" , Spark dev list <
>> dev@spark.apache.org>
>> *主题**: *Re: [VOTE] Release Spark 3.3.2 (RC1)
>>
>>
>>
>> Tried it one more time and the same result.
>>
>>
>>
>> On another box with Manjaro
>>
>> 
>> [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
>> [INFO]
>> [INFO] Spark Project Parent POM ... SUCCESS
>> [01:50 min]
>> [INFO] Spark Project Tags . SUCCESS [
>> 17.359 s]
>> [INFO] Spark Project Sketch ... SUCCESS [
>> 12.517 s]
>> [INFO] Spark Project Local DB . SUCCESS [
>> 14.463 s]
>> [INFO] Spark Project Networking ... SUCCESS
>> [01:07 min]
>> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
>>  9.013 s]
>> [INFO] Spark Project Unsafe ... SUCCESS [
>>  8.184 s]
>> [INFO] Spark Project Launcher . SUCCESS [
>> 10.454 s]
>> [INFO] Spark Project Core . SUCCESS
>> [23:58 min]
>> [INFO] Spark Project ML Local Library . SUCCESS [
>> 21.218 s]
>> [INFO] Spark Project GraphX ... SUCCESS
>> [01:24 min]
>> [INFO] Spark Project Streaming  SUCCESS
>> [04:57 min]
>> [INFO] Spark Project Catalyst . SUCCESS
>> [08:00 min]
>> [INFO] Spark Project SQL .. SUCCESS [
>>  01:02 h]
>> [INFO] Spark Project ML Library ... SUCCESS
>> [14:38 min]
>> [INFO] Spark Project Tools  SUCCESS [
>>  4.394 s]
>> [INFO] Spark Project Hive . SUCCESS
>> [53:43 min]
>> [INFO] Spark Project REPL . SUCCESS
>> [01:16 min]
>> [INFO] Spark Project Assembly . SUCCESS [
>>  2.186 s]
>> [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [
>> 16.150 s]
>> [INFO] Spark Integration for Kafka 0.10 ... SUCCESS
>> [01:34 min]
>> [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS
>> [32:55 min]
>> [INFO] Spark Project Examples . SUCCESS [
>> 23.800 s]
>> [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [
>>  7.301 s]
>> [INFO] Spark Avro . SUCCESS
>> [01:19 min]
>> [INFO]
>> 
>> [INFO] BUILD SUCCESS
>> [INFO]
>> 
>> [INFO] Total time:  03:31 h
>> [INFO] Finished at: 2023-02-12T21:54:20+01:00
>> [INFO]
>> 
>> [bjorn@amd7g spark-3.3.2]$  java -version
>> openjdk version "17.0.6" 2023-01-17
>> OpenJDK Runtime Environment (build 17.0.6+10)
>> OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)
>>
>>
>>
>>
>>
>> :)
>>
>>
>>
>> So I'm +1
>>
>>
>>
>>
>>
>> søn. 12. feb. 2023 kl. 12:53 skrev Bjørn Jørgensen <
>> bjornjorgen...@gmail.com>:
>>
>> I use ubuntu rolling
>>
>> $ java -version
>> openjdk version "17.0.6" 2023-01-17
>> OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu1)
>> OpenJDK 64-Bit Server VM (build 17.0.6+10-Ubuntu-0ubuntu1, mixed mode,
>> sharing)
>>
>>
>>
>> I have reboot now and restart ./build/mvn clean package
>>
>>
>>
>>
>>
>>
>>
>> søn. 12. feb. 2023 kl. 04:47 skrev Sean Owen :
>>
>> +1 The tests and all results were the same as ever for me (Java 11, Scala
>> 2.13, Ubuntu 22.04)
>>
>> I also didn't see that issue ... maybe somehow locale related? which
>> could still be a bug.
>>
>>
>>
>> On Sat, Feb 11, 2023 at 8:49 PM L. C. Hsieh  wrote:
>>
>> Thank you for testing it.
>>
>> I was going to run it again but still didn't see any errors.
>>
>> I also checked CI (and looked again now) on branch-3.3 before cutting RC.
>>
>> BTW, I didn't find an actual test failure (i.e. "- test_name ***
>> FAILED ***") in the log file.
>>
>> Maybe it is due to the dev env? What dev env you're using to run the test?
>>
>>
>> On Sat, Feb 11, 2023 at 8:58 AM Bjørn Jørgensen
>>  wrote:
>> >
>> >
>> > ./build/mvn clean package
>> >
>> > Run completed in 1 hour, 18 minutes, 29 seconds.
>> > 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Bjørn Jørgensen
On manjaro it is Python 3.10.9

On ubuntu it is Python 3.11.1

man. 13. feb. 2023 kl. 03:24 skrev yangjie01 :

> Which Python version do you use for testing? When I use the latest Python
> 3.11, I can reproduce similar test failures (43 tests of sql module fail),
> but when I use python 3.10, they will succeed
>
>
>
> YangJie
>
>
>
> *发件人**: *Bjørn Jørgensen 
> *日期**: *2023年2月13日 星期一 05:09
> *收件人**: *Sean Owen 
> *抄送**: *"L. C. Hsieh" , Spark dev list <
> dev@spark.apache.org>
> *主题**: *Re: [VOTE] Release Spark 3.3.2 (RC1)
>
>
>
> Tried it one more time and the same result.
>
>
>
> On another box with Manjaro
>
> 
> [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [01:50
> min]
> [INFO] Spark Project Tags . SUCCESS [
> 17.359 s]
> [INFO] Spark Project Sketch ... SUCCESS [
> 12.517 s]
> [INFO] Spark Project Local DB . SUCCESS [
> 14.463 s]
> [INFO] Spark Project Networking ... SUCCESS [01:07
> min]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
>  9.013 s]
> [INFO] Spark Project Unsafe ... SUCCESS [
>  8.184 s]
> [INFO] Spark Project Launcher . SUCCESS [
> 10.454 s]
> [INFO] Spark Project Core . SUCCESS [23:58
> min]
> [INFO] Spark Project ML Local Library . SUCCESS [
> 21.218 s]
> [INFO] Spark Project GraphX ... SUCCESS [01:24
> min]
> [INFO] Spark Project Streaming  SUCCESS [04:57
> min]
> [INFO] Spark Project Catalyst . SUCCESS [08:00
> min]
> [INFO] Spark Project SQL .. SUCCESS [
>  01:02 h]
> [INFO] Spark Project ML Library ... SUCCESS [14:38
> min]
> [INFO] Spark Project Tools  SUCCESS [
>  4.394 s]
> [INFO] Spark Project Hive . SUCCESS [53:43
> min]
> [INFO] Spark Project REPL . SUCCESS [01:16
> min]
> [INFO] Spark Project Assembly . SUCCESS [
>  2.186 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [
> 16.150 s]
> [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:34
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS [32:55
> min]
> [INFO] Spark Project Examples . SUCCESS [
> 23.800 s]
> [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [
>  7.301 s]
> [INFO] Spark Avro . SUCCESS [01:19
> min]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time:  03:31 h
> [INFO] Finished at: 2023-02-12T21:54:20+01:00
> [INFO]
> 
> [bjorn@amd7g spark-3.3.2]$  java -version
> openjdk version "17.0.6" 2023-01-17
> OpenJDK Runtime Environment (build 17.0.6+10)
> OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)
>
>
>
>
>
> :)
>
>
>
> So I'm +1
>
>
>
>
>
> søn. 12. feb. 2023 kl. 12:53 skrev Bjørn Jørgensen <
> bjornjorgen...@gmail.com>:
>
> I use ubuntu rolling
>
> $ java -version
> openjdk version "17.0.6" 2023-01-17
> OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu1)
> OpenJDK 64-Bit Server VM (build 17.0.6+10-Ubuntu-0ubuntu1, mixed mode,
> sharing)
>
>
>
> I have reboot now and restart ./build/mvn clean package
>
>
>
>
>
>
>
> søn. 12. feb. 2023 kl. 04:47 skrev Sean Owen :
>
> +1 The tests and all results were the same as ever for me (Java 11, Scala
> 2.13, Ubuntu 22.04)
>
> I also didn't see that issue ... maybe somehow locale related? which could
> still be a bug.
>
>
>
> On Sat, Feb 11, 2023 at 8:49 PM L. C. Hsieh  wrote:
>
> Thank you for testing it.
>
> I was going to run it again but still didn't see any errors.
>
> I also checked CI (and looked again now) on branch-3.3 before cutting RC.
>
> BTW, I didn't find an actual test failure (i.e. "- test_name ***
> FAILED ***") in the log file.
>
> Maybe it is due to the dev env? What dev env you're using to run the test?
>
>
> On Sat, Feb 11, 2023 at 8:58 AM Bjørn Jørgensen
>  wrote:
> >
> >
> > ./build/mvn clean package
> >
> > Run completed in 1 hour, 18 minutes, 29 seconds.
> > Total number of tests run: 11652
> > Suites: completed 516, aborted 0
> > Tests: succeeded 11609, failed 43, canceled 8, ignored 57, pending 0
> > *** 43 TESTS FAILED ***
> > [INFO]
> 
> > [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
> > [INFO]
> > 

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-13 Thread Holden Karau
Some general issues we found common ground around:

Inter-Pod security, istio + mTLS
Sidecar management
Docker Images
Add links to more related images
- Helm links
Data Locality concerns
Upgrading  Spark Versions
Performance issues

Thanks to everyone who was able to make the informal coffee chat

I'll try and schedule another one at a more European friendly time so that
we can all get to chat as well.

On Fri, Feb 10, 2023 at 1:08 PM Mich Talebzadeh 
wrote:

> Great looking forward to it
>
> Mich
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 10 Feb 2023 at 18:58, Holden Karau  wrote:
>
>> Ok so the first iteration of this is booked:
>>
>>
>> Spark on Kube Coffee Chats
>> Sunday, Feb 12 · 6–7 PM pacific time
>> Google Meet joining info
>> Video call link: https://meet.google.com/wge-tzzd-uyj
>>
>> Assuming that all goes well I’ll send out another doodle pole after this
>> one for the folks who could not make this one.
>>
>> Looking forward to catching up with y’all :) No prep work necessary but
>> if anyone wants to write down a brief like two sentence blurb about their
>> goals for Spark on Kube was thinking we might go around the virtual room
>> sharing that as our kicking off point for this coffee meeting :)
>>
>>
>> On Wed, Feb 8, 2023 at 12:27 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> That sounds like a good plan Holden!
>>>
>>>
>>> Let us go for it
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 8 Feb 2023 at 20:12, Holden Karau  wrote:
>>>
 My thought here was that it's more focused on getting to understand
 each other's goals / priorities and less solving any specific problem.

 For example, I know that some folks running on EKS have different
 priorities than folks running on-prem.

 We might (later on) make a roadmap doc if that seems necessary, but I'm
 hoping that just an understanding of folks priorities and challenges will
 make it easier for us to all collaborate.

 On Wed, Feb 8, 2023 at 11:47 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi all,
>
> Is this going to be a brainstorming meeting or there will be a prior
> agenda to work around it?
>
> thanks
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Wed, 8 Feb 2023 at 18:33, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Ok Colin thanks for clarification
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or any other property which 
>> may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 8 Feb 2023 at 18:08, Colin Williams <
>> colin.williams.seat...@gmail.com> wrote:
>>
>>> I'm sorry you misunderstood.  The context is migrating jobs to Spark
>>> on k8s.
>>>
>>> On Wed, Feb 8, 2023, 8:31 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi Colin,

 Thanks for your reply.


 I think both Yarn and Kubernetes are cluster managers plus
 

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread William Hyun
+1

Thank you,
William

On 2023/02/13 07:32:49 John Zhuge wrote:
> +1 (non-binding)
> 
> Rebased internal branch. Passed build with Java 8 and Scala 2.12. Passed
> integration tests with Python 3.10.
> 
> On Sun, Feb 12, 2023 at 8:49 PM Yuming Wang  wrote:
> 
> > +1.
> >
> > On Mon, Feb 13, 2023 at 11:52 AM yangjie01  wrote:
> >
> >> +1, Test 3.3.2-rc1 with Java 17 + Scala 2.13 + Python 3.10, all test
> >> passed.
> >>
> >>
> >>
> >> Yang Jie
> >>
> >>
> >>
> >> *发件人**: *Yikun Jiang 
> >> *日期**: *2023年2月13日 星期一 11:47
> >> *收件人**: *Spark dev list 
> >> *抄送**: *"L. C. Hsieh" 
> >> *主题**: *Re: [VOTE] Release Spark 3.3.2 (RC1)
> >>
> >>
> >>
> >> +1, Test 3.3.2-rc1 with spark-docker:
> >>
> >> - Downloading rc4 tgz, validate the key.
> >>
> >> - Extract bin and build image
> >>
> >> - Run K8s IT, standalone test of R/Python/Scala/All image [1]
> >>
> >>
> >>
> >> [1] https://github.com/apache/spark-docker/pull/29
> >> 
> >>
> >>
> >>
> >> Regards,
> >>
> >> Yikun
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Feb 13, 2023 at 10:25 AM yangjie01  wrote:
> >>
> >> Which Python version do you use for testing? When I use the latest Python
> >> 3.11, I can reproduce similar test failures (43 tests of sql module fail),
> >> but when I use python 3.10, they will succeed
> >>
> >>
> >>
> >> YangJie
> >>
> >>
> >>
> >> *发件人**: *Bjørn Jørgensen 
> >> *日期**: *2023年2月13日 星期一 05:09
> >> *收件人**: *Sean Owen 
> >> *抄送**: *"L. C. Hsieh" , Spark dev list <
> >> dev@spark.apache.org>
> >> *主题**: *Re: [VOTE] Release Spark 3.3.2 (RC1)
> >>
> >>
> >>
> >> Tried it one more time and the same result.
> >>
> >>
> >>
> >> On another box with Manjaro
> >>
> >> 
> >> [INFO] Reactor Summary for Spark Project Parent POM 3.3.2:
> >> [INFO]
> >> [INFO] Spark Project Parent POM ... SUCCESS
> >> [01:50 min]
> >> [INFO] Spark Project Tags . SUCCESS [
> >> 17.359 s]
> >> [INFO] Spark Project Sketch ... SUCCESS [
> >> 12.517 s]
> >> [INFO] Spark Project Local DB . SUCCESS [
> >> 14.463 s]
> >> [INFO] Spark Project Networking ... SUCCESS
> >> [01:07 min]
> >> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
> >>  9.013 s]
> >> [INFO] Spark Project Unsafe ... SUCCESS [
> >>  8.184 s]
> >> [INFO] Spark Project Launcher . SUCCESS [
> >> 10.454 s]
> >> [INFO] Spark Project Core . SUCCESS
> >> [23:58 min]
> >> [INFO] Spark Project ML Local Library . SUCCESS [
> >> 21.218 s]
> >> [INFO] Spark Project GraphX ... SUCCESS
> >> [01:24 min]
> >> [INFO] Spark Project Streaming  SUCCESS
> >> [04:57 min]
> >> [INFO] Spark Project Catalyst . SUCCESS
> >> [08:00 min]
> >> [INFO] Spark Project SQL .. SUCCESS [
> >>  01:02 h]
> >> [INFO] Spark Project ML Library ... SUCCESS
> >> [14:38 min]
> >> [INFO] Spark Project Tools  SUCCESS [
> >>  4.394 s]
> >> [INFO] Spark Project Hive . SUCCESS
> >> [53:43 min]
> >> [INFO] Spark Project REPL . SUCCESS
> >> [01:16 min]
> >> [INFO] Spark Project Assembly . SUCCESS [
> >>  2.186 s]
> >> [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [
> >> 16.150 s]
> >> [INFO] Spark Integration for Kafka 0.10 ... SUCCESS
> >> [01:34 min]
> >> [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS
> >> [32:55 min]
> >> [INFO] Spark Project Examples . SUCCESS [
> >> 23.800 s]
> >> [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [
> >>  7.301 s]
> >> [INFO] Spark Avro . SUCCESS
> >> [01:19 min]
> >> [INFO]
> >> 
> >> [INFO] BUILD SUCCESS
> >> [INFO]
> >> 
> >> [INFO] Total time:  03:31 h
> >> [INFO] Finished at: 2023-02-12T21:54:20+01:00
> >> [INFO]
> >> 
> >> [bjorn@amd7g spark-3.3.2]$  java -version
> >> openjdk version "17.0.6" 2023-01-17
> >> OpenJDK Runtime Environment (build 17.0.6+10)
> >> OpenJDK 64-Bit Server VM (build 17.0.6+10, mixed mode)
> >>
> >>
> >>
> >>
> >>
> >> :)
> >>
> >>
> >>
> >> So I'm +1
> >>
> >>
> >>
> >>
> >>
> >> søn. 12. feb. 2023 kl. 12:53 skrev Bjørn Jørgensen <
> >> bjornjorgen...@gmail.com>:
> >>
> >> I use ubuntu rolling
> >>
> >> $ java -version
> >> openjdk