Re: PySpark 3.5.0 on PyPI

2023-09-20 Thread Kezhi Xiong
Oh, I saw it now. Thanks! On Wed, Sep 20, 2023 at 1:04 PM Sean Owen wrote: > [ External sender. Exercise caution. ] > > I think the announcement mentioned there were some issues with pypi and > the upload size this time. I am sure it's intended to be there when > possible. > > On Wed, Sep 20,

Re: PySpark 3.5.0 on PyPI

2023-09-20 Thread Sean Owen
I think the announcement mentioned there were some issues with pypi and the upload size this time. I am sure it's intended to be there when possible. On Wed, Sep 20, 2023, 3:00 PM Kezhi Xiong wrote: > Hi, > > Are there any plans to upload PySpark 3.5.0 to PyPI ( >

PySpark 3.5.0 on PyPI

2023-09-20 Thread Kezhi Xiong
Hi, Are there any plans to upload PySpark 3.5.0 to PyPI ( https://pypi.org/project/pyspark/)? It's still 3.4.1. Thanks, Kezhi

[Spark 3.5.0] Is the protobuf-java JAR no longer shipped with Spark?

2023-09-20 Thread Gijs Hendriksen
Hi all, This week, I tried upgrading to Spark 3.5.0, as it contained some fixes for spark-protobuf that I need for my project. However, my code is no longer running under Spark 3.5.0. My build.sbt file is configured as follows: val sparkV  = "3.5.0" val hadoopV = "3.3.6"

Re: Discriptency sample standard deviation pyspark and Excel

2023-09-20 Thread Sean Owen
This has turned into a big thread for a simple thing and has been answered 3 times over now. Neither is better, they just calculate different things. That the 'default' is sample stddev is just convention. stddev_pop is the simple standard deviation of a set of numbers stddev_samp is used when

Re: Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-20 Thread Gowtham S
Hi Spark Community, Thank you for bringing up this issue. We've also encountered the same challenge and are actively working on finding a solution. It's reassuring to know that we're not alone in this. If you have any insights or suggestions regarding how to address this problem, please feel

Re: Discriptency sample standard deviation pyspark and Excel

2023-09-20 Thread Mich Talebzadeh
Spark uses the sample standard deviation stddev_samp by default, whereas *Hive* uses population standard deviation stddev_pop as default. My understanding is that spark uses sample standard deviation by default because - It is more commonly used. - It is more efficient to calculate. -