Re: Python for the kids and now PySpark

2024-04-27 Thread Farshid Ashouri
Mich, this is absolutely amazing.

Thanks for sharing.

On Sat, 27 Apr 2024, 22:26 Mich Talebzadeh, 
wrote:

> Python for the kids. Slightly off-topic but worthwhile sharing.
>
> One of the things that may benefit kids is starting to learn something
> new. Basically anything that can focus their attention away from games for
> a few hours. Around 2020, my son Christian (now nearly 15) decided to
> learn a programming language. So somehow he picked Python to start with.
> The kids are good when they focus. However, they live in a virtual reality
> world and they cannot focus for long hours. I let him explore Python on his
> Windows 10 laptop and download it himself. In the following video Christian
> explains to his mother what he started to do just before going to bed. BTW,
> when he says 32M he means 32-bit. I leave it to you to judge :) Now the
> idea is to start learning PySpark. So I will let him do it himself and
> learn from his mistakes.  For those who have kids, I would be interested to
> know their opinion.
>
> Cheers
>
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-20 Thread Farshid Ashouri
+1

On Mon, 18 Mar 2024, 11:00 Mich Talebzadeh, 
wrote:

> Some of you may be aware that Databricks community Home | Databricks
> have just launched a knowledge sharing hub. I thought it would be a
> good idea for the Apache Spark user group to have the same, especially
> for repeat questions on Spark core, Spark SQL, Spark Structured
> Streaming, Spark Mlib and so forth.
>
> Apache Spark user and dev groups have been around for a good while.
> They are serving their purpose . We went through creating a slack
> community that managed to create more more heat than light.. This is
> what Databricks community came up with and I quote
>
> "Knowledge Sharing Hub
> Dive into a collaborative space where members like YOU can exchange
> knowledge, tips, and best practices. Join the conversation today and
> unlock a wealth of collective wisdom to enhance your experience and
> drive success."
>
> I don't know the logistics of setting it up.but I am sure that should
> not be that difficult. If anyone is supportive of this proposal, let
> the usual +1, 0, -1 decide
>
> HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner Von Braun)".
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Farshid Ashouri
This is wonderful news!

On Tue, 4 Jul 2023 at 01:14, Gengliang Wang  wrote:

> Dear Apache Spark community,
>
> We are delighted to announce the launch of a groundbreaking tool that aims
> to make Apache Spark more user-friendly and accessible - the English SDK
> <https://github.com/databrickslabs/pyspark-ai/>. Powered by the
> application of Generative AI, the English SDK
> <https://github.com/databrickslabs/pyspark-ai/> allows you to execute
> complex tasks with simple English instructions. This exciting news was 
> announced
> recently at the Data+AI Summit
> <https://www.youtube.com/watch?v=yj7XlTB1Jvc=511s> and also introduced
> through a detailed blog post
> <https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark>
> .
>
> Now, we need your invaluable feedback and contributions. The aim of the
> English SDK is not only to simplify and enrich your Apache Spark experience
> but also to grow with the community. We're calling upon Spark developers
> and users to explore this innovative tool, offer your insights, provide
> feedback, and contribute to its evolution.
>
> You can find more details about the SDK and usage examples on the GitHub
> repository https://github.com/databrickslabs/pyspark-ai/. If you have any
> feedback or suggestions, please feel free to open an issue directly on the
> repository. We are actively monitoring the issues and value your insights.
>
> We also welcome pull requests and are eager to see how you might extend or
> refine this tool. Let's come together to continue making Apache Spark more
> approachable and user-friendly.
>
> Thank you in advance for your attention and involvement. We look forward
> to hearing your thoughts and seeing your contributions!
>
> Best,
> Gengliang Wang
>
-- 


*Farshid Ashouri*,
Senior Vice President,
J.P. Morgan & Chase Co.
+44 7932 650 788


Re: Rename columns without manually setting them all

2023-06-21 Thread Farshid Ashouri
You can use selectExpr and stack to achieve the same effect in PySpark:



df = spark.read.csv("your_file.csv", header=True, inferSchema=True)

date_columns = [col for col in df.columns if '/' in col]

df = df.selectExpr(["`Employee ID`", "`Name`", "`Client`", "`Project`", 
"`Team`”] 
+ [f"stack({len(date_columns)}, {', '.join([f'`{col}`, `{col}` as 
`Status`' for col in date_columns])}) as (`Date`, `Status`)”])

result = df.groupby("Date", "Status").count()




> On 21 Jun 2023, at 11:45, John Paul Jayme  wrote:
> 
> Hi,
> 
> This is currently my column definition :
> Employee ID   NameClient  Project Team01/01/2022  02/01/2022  
> 03/01/2022  04/01/2022  05/01/2022
> 12345 Dummy x Dummy a abc team a  OFF WO  WH  WH  
> WH
> 
> As you can see, the outer columns are just daily attendance dates. My goal is 
> to count the employees who were OFF / WO / WH on said dates. I need to 
> transpose them so it would look like this : 
> 
> 
> 
> I am still new to pandas. Can you guide me on how to produce this? I am 
> reading about melt() and set_index() but I am not sure if they are the 
> correct functions to use.