Re: Topics for Spark online classes & webinars

2023-03-28 Thread asma zgolli
Hello everyone,

I suggest using the slack for the spark community created recently to
collaborate and work together on these topics and use the LinkedIn page to
publish the events and the webinars.

Cheers,
Asma

Le jeu. 16 mars 2023 à 01:39, Denny Lee  a écrit :

> What we can do is get into the habit of compiling the list on LinkedIn but
> making sure this list is shared and broadcast here, eh?!
>
> As well, when we broadcast the videos, we can do this using zoom/jitsi/
> riverside.fm as well as simulcasting this on LinkedIn. This way you can
> view directly on the former without ever logging in with a user ID.
>
> HTH!!
>
> On Wed, Mar 15, 2023 at 4:30 PM Mich Talebzadeh 
> wrote:
>
>> Understood Nitin It would be wrong to act against one's conviction. I am
>> sure we can find a way around providing the contents
>>
>> Regards
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead
>> Palantir Technologies Limited
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Mar 2023 at 22:34, Nitin Bhansali 
>> wrote:
>>
>>> Hi Mich,
>>>
>>> Thanks for your prompt response ... much appreciated. I know how to and
>>> can create login IDs on such sites but I had taken conscious decision some
>>> 20 years ago ( and i will be going against my principles) not to be on such
>>> sites. Hence I had asked for is there any other way I can join/view
>>> recording of webinar.
>>>
>>> Anyways not to worry.
>>>
>>> Thanks & Regards
>>>
>>> Nitin.
>>>
>>>
>>> On Wednesday, 15 March 2023 at 20:37:55 GMT, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>
>>> Hi Nitin,
>>>
>>> Linkedin is more of a professional media.  FYI, I am only a member of
>>> Linkedin, no facebook, etc.There is no reason for you NOT to create a
>>> profile for yourself  in linkedin :)
>>>
>>>
>>> https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en
>>>
>>> see you there as well.
>>>
>>> Best of luck.
>>>
>>>
>>> Mich Talebzadeh,
>>> Lead Solutions Architect/Engineering Lead,
>>> Palantir Technologies Limited
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
>>> wrote:
>>>
>>> Hello Mich,
>>>
>>> My apologies  ...  but I am not on any of such social/professional
>>> sites?
>>> Any other way to access such webinars/classes?
>>>
>>> Thanks & Regards
>>> Nitin.
>>>
>>> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
>>> denny.g@gmail.com> wrote:
>>>
>>>
>>> Thanks Mich for tackling this!  I encourage everyone to add to the list
>>> so we can have a comprehensive list of topics, eh?!
>>>
>>> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> Thanks to @Denny Lee   to give access to
>>>
>>> https://www.linkedin.com/company/apachespark/
>>>
>>> and contribution from @asma zgolli 
>>>
>>> You will see my post at the bottom. Please add anything else on topics
>>> to the list as a comment.
>>>
>>> We will then p

Re: Slack for PySpark users

2023-03-28 Thread asma zgolli
Hello @Mich Talebzadeh  ,

I suggest we use this slack to plan and organize "Online classes for spark
topics".

Best,
Asma


Le mar. 28 mars 2023 à 14:37, Shani Alisar
 a écrit :

> Hi all,
>
> We recently opened an unofficial spark community slack workspace
> Please join so we can increase the community and knowledge - Link
> <https://join.slack.com/t/sparkcommunitytalk/shared_invite/zt-1rk11diac-hzGbOEdBHgjXf02IZ1mvUA>
>
> Cheers,
> Shani
>
>
> *From:* שוהם יהודה 
> *Date:* 28 March 2023 at 8:27:36 GMT+3
> *To:* shani.alis...@gmail.com
> *Subject:* *Fwd: Slack for PySpark users*
>
> 
>
>
> -- Forwarded message -
> From: asma zgolli 
> Date: Tue, Mar 28, 2023, 05:51
> Subject: Re: Slack for PySpark users
> To: Winston Lai 
> Cc: Denny Lee , Hyukjin Kwon ,
> keen , user@spark.apache.org 
>
>
> +1 good idea, I d like to join as well.
>
> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
> écrit :
>
>> Please let us know when the channel is created. I'd like to join :)
>>
>> Thank You & Best Regards
>> Winston Lai
>> --
>> *From:* Denny Lee 
>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
>> *To:* Hyukjin Kwon 
>> *Cc:* keen ; user@spark.apache.org > >
>> *Subject:* Re: Slack for PySpark users
>>
>> +1 I think this is a great idea!
>>
>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon  wrote:
>>
>> Yeah, actually I think we should better have a slack channel so we can
>> easily discuss with users and developers.
>>
>> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>>
>> Hi all,
>> I really like *Slack *as communication channel for a tech community.
>> There is a Slack workspace for *delta lake users* (
>> https://go.delta.io/slack
>> <https://ddec1-0-en-ctp.trendmicro.com/wis/clicktime/v1/query?url=https%3a%2f%2fgo.delta.io%2fslack=0b9a6709-7226-43f6-be34-cde70ba40717=49e750dc5ef8b6490f00203d8e1f5386b1758a80-2d750bafb3f887ef42c3fa139ca2f15f70b2ca27>)
>> that I enjoy a lot.
>> I was wondering if there is something similar for PySpark users.
>>
>> If not, would there be anything wrong with creating a new Slack workspace
>> for PySpark users? (when explicitly mentioning that this is *not*
>> officially part of Apache Spark)?
>>
>> Cheers
>> Martin
>>
>>
>
> --
> Asma ZGOLLI
>
> Ph.D. in Big Data - Applied Machine Learning
>
>
>


Re: Slack for PySpark users

2023-03-27 Thread asma zgolli
+1 good idea, I d like to join as well.

Le mar. 28 mars 2023 à 04:09, Winston Lai  a écrit :

> Please let us know when the channel is created. I'd like to join :)
>
> Thank You & Best Regards
> Winston Lai
> --
> *From:* Denny Lee 
> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
> *To:* Hyukjin Kwon 
> *Cc:* keen ; user@spark.apache.org 
> *Subject:* Re: Slack for PySpark users
>
> +1 I think this is a great idea!
>
> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon  wrote:
>
> Yeah, actually I think we should better have a slack channel so we can
> easily discuss with users and developers.
>
> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>
> Hi all,
> I really like *Slack *as communication channel for a tech community.
> There is a Slack workspace for *delta lake users* (
> https://go.delta.io/slack) that I enjoy a lot.
> I was wondering if there is something similar for PySpark users.
>
> If not, would there be anything wrong with creating a new Slack workspace
> for PySpark users? (when explicitly mentioning that this is *not*
> officially part of Apache Spark)?
>
> Cheers
> Martin
>
>

-- 
Asma ZGOLLI

Ph.D. in Big Data - Applied Machine Learning


Re: Topics for Spark online classes & webinars

2023-03-13 Thread asma zgolli
Hello Mich,

Can you please provide the link for the confluence page?

Many thanks
Asma
Ph.D. in Big Data - Applied Machine Learning

Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh  a
écrit :

> Apologies I missed the list.
>
> To move forward I selected these topics from the thread "Online classes
> for spark topics".
>
> To take this further I propose a confluence page to be seup.
>
>
>1. Spark UI
>2. Dynamic allocation
>3. Tuning of jobs
>4. Collecting spark metrics for monitoring and alerting
>5.  For those who prefer to use Pandas API on Spark since the release
>of Spark 3.2, What are some important notes for those users? For example,
>what are the additional factors affecting the Spark performance using
>Pandas API on Spark? How to tune them in addition to the conventional Spark
>tuning methods applied to Spark SQL users.
>6. Spark internals and/or comparing spark 3 and 2
>7. Spark Streaming & Spark Structured Streaming
>8. Spark on notebooks
>9. Spark on serverless (for example Spark on Google Cloud)
>10. Spark on k8s
>
> Opinions and how to is welcome
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh 
> wrote:
>
>> Hi guys
>>
>> To move forward I selected these topics from the thread "Online classes
>> for spark topics".
>>
>> To take this further I propose a confluence page to be seup.
>>
>> Opinions and how to is welcome
>>
>> Cheers
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>


Re: [EXTERNAL] Re: Online classes for spark topics

2023-03-09 Thread asma zgolli
Hello spark community,


Adding a new topic.

   - Spark UI
   - Dynamic allocation
   - Tuning of jobs
   - Collecting spark metrics for monitoring and alerting
   - For those who prefer to use Pandas API on Spark since the release of
   Spark 3.2, What are some important notes for those users? For example, what
   are the additional factors affecting the Spark performance using Pandas API
   on Spark? How to tune them in addition to the conventional Spark tuning
   methods applied to Spark SQL users.
   - Spark internals and/or comparing spark 3 and 2 (I can take care of
   this if the community finds the topic interesting)


Le jeu. 9 mars 2023 à 10:14, Winston Lai  a écrit :

> Hi everyone,
>
> I would like to add one topic to Saurabh's list as well.
>
>- Spark UI
>- Dynamic allocation
>- Tuning of jobs
>- Collecting spark metrics for monitoring and alerting
>- For those who prefer to use Pandas API on Spark since the release of
>Spark 3.2, What are some important notes for those users? For example, what
>are the additional factors affecting the Spark performance using Pandas API
>on Spark? How to tune them in addition to the conventional Spark tuning
>methods applied to Spark SQL users.
>
>
> Thank You & Best Regards
> Winston Lai
> --
> *From:* Saurabh Gulati 
> *Sent:* Thursday, March 9, 2023 5:04:35 PM
> *To:* Mich Talebzadeh ; Deepak Sharma <
> deepakmc...@gmail.com>
> *Cc:* Denny Lee ; Sofia’s World <
> mmistr...@gmail.com>; User ; Winston Lai <
> weiruanl...@gmail.com>; ashok34...@yahoo.com ; asma
> zgolli ; karan alang 
> *Subject:* Re: [EXTERNAL] Re: Online classes for spark topics
>
> Hey guys,
> Its a nice idea and appreciate the effort you guys are taking.
> I can add to the list of topics which might be of interest:
>
>- Spark UI
>- Dynamic allocation
>- Tuning of jobs
>- Collecting spark metrics for monitoring and alerting
>
>
> HTH
> --
> *From:* Mich Talebzadeh 
> *Sent:* 09 March 2023 09:00
> *To:* Deepak Sharma 
> *Cc:* Denny Lee ; Sofia’s World <
> mmistr...@gmail.com>; User ; Winston Lai <
> weiruanl...@gmail.com>; ashok34...@yahoo.com ; asma
> zgolli ; karan alang 
> *Subject:* [EXTERNAL] Re: Online classes for spark topics
>
> *Caution! This email originated outside of FedEx. Please do not open
> attachments or click links from an unknown or suspicious origin*.
> Hi Deepak,
>
> The priority list of topics is a very good point. The theard owner
> mentioned Spark on k8s, Data Science and Spark Structured Streaming. What
> other topics need to be included I guess it depends on demand.. I suggest
> we wait a couple of days to see the demand .
>
> We just need to create a draft list of topics of interest and share them
> in the forum to get the priority order.
>
> Well that is my thoughts.
>
> Cheers
>
>
>
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!BL9GA0TyTA!diGzOFoBDwbNwJnASVo09eytoXzhdUa03QoTvaP_HPka86FafTU8DqCg4fPtTSe08Y9ycp0Uie5nWTlKlGtK81xy3kU6Mg$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!BL9GA0TyTA!diGzOFoBDwbNwJnASVo09eytoXzhdUa03QoTvaP_HPka86FafTU8DqCg4fPtTSe08Y9ycp0Uie5nWTlKlGtK81wx_fbKIQ$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 9 Mar 2023 at 06:13, Deepak Sharma  wrote:
>
> I can prepare some topics and present as well , if we have a prioritised
> list of topics already .
>
> On Thu, 9 Mar 2023 at 11:42 AM, Denny Lee  wrote:
>
> We used to run Spark webinars on the Apache Spark LinkedIn group
> <https://urldefense.com/v3/__https://www.linkedin.com/company/apachespark/?viewAsMember=true__;!!BL9GA0TyTA!diGzOFoBDwbNwJnASVo09eytoXzhdUa03QoTvaP_HPka86FafTU8DqCg4fPtTSe08Y9ycp0Uie5nWTlKlGtK81ykBUs6eA$>
>  but
> honestly the turnout was pretty low.  We had dove into various features.
> If there are particular topics that. you would like to discuss during a
> live session, please let me know and we can try to restart them.  HTH!
>
> On Wed, Mar 8, 2023 at 9:45 PM Sofia’s World  wrote:
>
> +1
>
> On Wed, Mar 8, 2023 at 10:40 PM Winston Lai  wrote:
>
> +1, any webinar on Spark related topic is appreciated 
>
> T

Re: Online classes for spark topics

2023-03-08 Thread asma zgolli
+1

Le mer. 8 mars 2023 à 21:32, karan alang  a écrit :

> +1 .. I'm happy to be part of these discussions as well !
>
>
>
>
> On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> I guess I can schedule this work over a course of time. I for myself can
>> contribute plus learn from others.
>>
>> So +1 for me.
>>
>> Let us see if anyone else is interested.
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 8 Mar 2023 at 17:48, ashok34...@yahoo.com 
>> wrote:
>>
>>>
>>> Hello Mich.
>>>
>>> Greetings. Would you be able to arrange for Spark Structured Streaming
>>> learning webinar.?
>>>
>>> This is something I haven been struggling with recently. it will be very
>>> helpful.
>>>
>>> Thanks and Regard
>>>
>>> AK
>>> On Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>
>>> Hi,
>>>
>>> This might  be a worthwhile exercise on the assumption that the
>>> contributors will find the time and bandwidth to chip in so to speak.
>>>
>>> I am sure there are many but on top of my head I can think of Holden
>>> Karau for k8s, and Sean Owen for data science stuff. They are both very
>>> experienced.
>>>
>>> Anyone else 樂
>>>
>>> HTH
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
>>>  wrote:
>>>
>>> Hello gurus,
>>>
>>> Does Spark arranges online webinars for special topics like Spark on
>>> K8s, data science and Spark Structured Streaming?
>>>
>>> I would be most grateful if experts can share their experience with
>>> learners with intermediate knowledge like myself. Hopefully we will find
>>> the practical experiences told valuable.
>>>
>>> Respectively,
>>>
>>> AK
>>>
>>>


shuffle mathematic formulat

2020-02-04 Thread asma zgolli
dear spark contributors,

I'm searching for a way to model spark shuffle cost and i wonder if there s
mathematic formulas to compute "shuffle read " and "shuffle write" sizes in
the stages view in spark UI.
if there isn't, are there any references to head start in this.
Stage Id  ▾
<http://localhost:4040/stages/?=Stage+Id=false=100#completed>
Description
<http://localhost:4040/stages/?=Description=100#completed>
Submitted
<http://localhost:4040/stages/?=Submitted=100#completed>
Duration
<http://localhost:4040/stages/?=Duration=100#completed>Tasks:
Succeeded/TotalInput
<http://localhost:4040/stages/?=Input=100#completed>
Output
<http://localhost:4040/stages/?=Output=100#completed>Shuffle
Read
<http://localhost:4040/stages/?=Shuffle+Read=100#completed>Shuffle
Write
<http://localhost:4040/stages/?=Shuffle+Write=100#completed>

thank you for the help and the directions
yours sincerely
Asma ZGOLLI

Ph.D. student in data engineering - computer science


Re: error , saving dataframe , LEGACY_PASS_PARTITION_BY_AS_OPTIONS

2019-11-13 Thread asma zgolli
the line of code that resulted in that error was :

ch.cern.sparkmeasure.StageMetrics.saveData


it also was started while trying to save a dataframe in hbase using hbase
spark connector :

df1.write.options(

  Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.
newTable -> "4"))

  .format("org.apache.hadoop.hbase.spark")

  .save()


the full stack trace is as follows:



Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.sql.internal.SQLConf$.LEGACY_PASS_PARTITION_BY_AS_OPTIONS()Lorg/apache/spark/internal/config/ConfigEntry;

at org.apache.spark.sql.DataFrameWriter.saveToV1Source(
DataFrameWriter.scala:277)

at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)

at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)

at ch.cern.sparkmeasure.StageMetrics.saveData(stagemetrics.scala:297)

...

...

at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

at scala.App$$anonfun$main$1.apply(App.scala:76)

at scala.App$$anonfun$main$1.apply(App.scala:76)

at scala.collection.immutable.List.foreach(List.scala:381)

at scala.collection.generic.TraversableForwarder$class.foreach(
TraversableForwarder.scala:35)

at scala.App$class.main(App.scala:76)

Le mer. 13 nov. 2019 à 16:05, Femi Anthony  a écrit :

> Can you post the line of code that’s resulting in that error along with
> the stack trace ?
>
> Sent from my iPhone
>
> On Nov 13, 2019, at 9:53 AM, asma zgolli  wrote:
>
> 
>
> Hello ,
>
>
> I'm using spark 2.4.4 and i keep receiving this error message. Can you
> please help me identify the problem?
>
>
> thank you ,
>
> yours sincerely
> Asma ZGOLLI
>
> PhD student in data engineering - computer science
>
> PJ:
>
>
>
> "main" java.lang.NoSuchMethodError:
> org.apache.spark.sql.internal.SQLConf$.LEGACY_PASS_PARTITION_BY_AS_OPTIONS()Lorg/apache/spark/internal/config/ConfigEntry;
>
> at org.apache.spark.sql.DataFrameWriter.saveToV1Source(
> DataFrameWriter.scala:277)
>
>
>
>
>
>

-- 
Asma ZGOLLI

PhD student in data engineering - computer science


error , saving dataframe , LEGACY_PASS_PARTITION_BY_AS_OPTIONS

2019-11-13 Thread asma zgolli
Hello ,


I'm using spark 2.4.4 and i keep receiving this error message. Can you
please help me identify the problem?


thank you ,

yours sincerely
Asma ZGOLLI

PhD student in data engineering - computer science

PJ:



"main" java.lang.NoSuchMethodError:
org.apache.spark.sql.internal.SQLConf$.LEGACY_PASS_PARTITION_BY_AS_OPTIONS()Lorg/apache/spark/internal/config/ConfigEntry;

at org.apache.spark.sql.DataFrameWriter.saveToV1Source(
DataFrameWriter.scala:277)


Re: Parallelize Join Problem

2019-04-17 Thread asma zgolli
How can I figure out if the data is skewed ? are there some statistics i
can check ?

Le mer. 17 avr. 2019 à 20:12, Yeikel  a écrit :

> It is hard to tell , but your data may be skewed
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
Asma ZGOLLI

PhD student in data engineering - computer science


Fwd: Cross Join

2019-03-21 Thread asma zgolli
-- Forwarded message -
From: asma zgolli 
Date: jeu. 21 mars 2019 à 18:15
Subject: Cross Join
To: 


Hello ,

I need to cross my data and i'm executing a cross join on two dataframes .

C = A.crossJoin(B)
A has 50 records
B has 5 records

the result im getting with spark 2.0 is a dataframe C having 50 records.

only the first row from B was added to C.

Is that a bug in Spark?

Asma ZGOLLI

PhD student in data engineering - computer science



-- 
Asma ZGOLLI

PhD student in data engineering - computer science
Email : zgollia...@gmail.com
email alt:  asma.zgo...@univ-grenoble-alpes.fr 
Tel : (+33) 07 52 95 04 45
(+216) 50 126 797
Skype : asma_zgolli