Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-15 Thread Mich Talebzadeh
+1  for me.

The SPIP document is well written as well.

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 14 Jun 2023 at 00:10, Amanda Liu 
wrote:

> Hi all,
>
> I'd like to start a discussion about implementing an official PySpark test
> framework. Currently, there's no official test framework, but only various
> open-source repos and blog posts.
>
> Many of these open-source resources are very popular, which demonstrates
> user-demand for PySpark testing capabilities. spark-testing-base
>  has 1.4k stars, and chispa
>  has 532k downloads/month. However,
> it can be confusing for users to piece together disparate resources to
> write their own PySpark tests (see The Elephant in the Room: How to Write
> PySpark Tests
> 
> ).
>
> We can streamline and simplify the testing process by incorporating test
> features, such as a PySpark Test Base class (which allows tests to share
> Spark sessions) and test util functions (for example, asserting dataframe
> and schema equality).
>
> Please see the SPIP document attached:
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>
> I would appreciate it if you could share your thoughts on this proposal.
>
> Thank you!
> Amanda Liu
>


Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-14 Thread Ruifeng Zheng
+1 from my side

sounds good, it will be helpful to both users and contributors to improve
the test coverage

On Wed, Jun 14, 2023 at 8:27 AM Hyukjin Kwon  wrote:

> Yeah, I have been thinking about this too, and Holden did some work here
> that this SPIP will reuse. I support this.
>
> On Wed, 14 Jun 2023 at 08:10, Amanda Liu 
> wrote:
>
>> Hi all,
>>
>> I'd like to start a discussion about implementing an official PySpark
>> test framework. Currently, there's no official test framework, but only
>> various open-source repos and blog posts.
>>
>> Many of these open-source resources are very popular, which demonstrates
>> user-demand for PySpark testing capabilities. spark-testing-base
>>  has 1.4k stars, and
>> chispa  has 532k downloads/month.
>> However, it can be confusing for users to piece together disparate
>> resources to write their own PySpark tests (see The Elephant in the
>> Room: How to Write PySpark Tests
>> 
>> ).
>>
>> We can streamline and simplify the testing process by incorporating test
>> features, such as a PySpark Test Base class (which allows tests to share
>> Spark sessions) and test util functions (for example, asserting dataframe
>> and schema equality).
>>
>> Please see the SPIP document attached:
>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
>> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>>
>> I would appreciate it if you could share your thoughts on this proposal.
>>
>> Thank you!
>> Amanda Liu
>>
>


Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Hyukjin Kwon
Yeah, I have been thinking about this too, and Holden did some work here
that this SPIP will reuse. I support this.

On Wed, 14 Jun 2023 at 08:10, Amanda Liu 
wrote:

> Hi all,
>
> I'd like to start a discussion about implementing an official PySpark test
> framework. Currently, there's no official test framework, but only various
> open-source repos and blog posts.
>
> Many of these open-source resources are very popular, which demonstrates
> user-demand for PySpark testing capabilities. spark-testing-base
>  has 1.4k stars, and chispa
>  has 532k downloads/month. However,
> it can be confusing for users to piece together disparate resources to
> write their own PySpark tests (see The Elephant in the Room: How to Write
> PySpark Tests
> 
> ).
>
> We can streamline and simplify the testing process by incorporating test
> features, such as a PySpark Test Base class (which allows tests to share
> Spark sessions) and test util functions (for example, asserting dataframe
> and schema equality).
>
> Please see the SPIP document attached:
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>
> I would appreciate it if you could share your thoughts on this proposal.
>
> Thank you!
> Amanda Liu
>


[DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Amanda Liu
Hi all,

I'd like to start a discussion about implementing an official PySpark test
framework. Currently, there's no official test framework, but only various
open-source repos and blog posts.

Many of these open-source resources are very popular, which demonstrates
user-demand for PySpark testing capabilities. spark-testing-base
 has 1.4k stars, and chispa
 has 532k downloads/month. However, it
can be confusing for users to piece together disparate resources to write
their own PySpark tests (see The Elephant in the Room: How to Write PySpark
Tests

).

We can streamline and simplify the testing process by incorporating test
features, such as a PySpark Test Base class (which allows tests to share
Spark sessions) and test util functions (for example, asserting dataframe
and schema equality).

Please see the SPIP document attached:
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042

I would appreciate it if you could share your thoughts on this proposal.

Thank you!
Amanda Liu