Re: What is the range of the PageRank value of graphx

2023-03-28 Thread Sean Owen
>From the docs: * Note that this is not the "normalized" PageRank and as a consequence pages that have no * inlinks will have a PageRank of alpha. In particular, the pageranks may have some values * greater than 1. On Tue, Mar 28, 2023 at 9:11 AM lee wrote: > When I calculate pagerank using

Re: Slack for PySpark users

2023-03-28 Thread Mich Talebzadeh
Thank You & Best Regards >> Winston Lai >> -- >> *From:* Denny Lee >> *Sent:* Tuesday, March 28, 2023 9:43:08 AM >> *To:* Hyukjin Kwon >> *Cc:* keen ; user@spark.apache.org > > >> *Subject:* Re: Slack for PySpark users >> >>

Re: Topics for Spark online classes & webinars

2023-03-28 Thread asma zgolli
Hello everyone, I suggest using the slack for the spark community created recently to collaborate and work together on these topics and use the LinkedIn page to publish the events and the webinars. Cheers, Asma Le jeu. 16 mars 2023 à 01:39, Denny Lee a écrit : > What we can do is get into the

Re: Slack for PySpark users

2023-03-28 Thread asma zgolli
..@gmail.com > *Subject:* *Fwd: Slack for PySpark users* > >  > > > -- Forwarded message ----- > From: asma zgolli > Date: Tue, Mar 28, 2023, 05:51 > Subject: Re: Slack for PySpark users > To: Winston Lai > Cc: Denny Lee , Hyukjin Kwon , > keen , user@spark

Re: Slack for PySpark users

2023-03-28 Thread Shani Alisar
ate: 28 March 2023 at 8:27:36 GMT+3 >> To: shani.alis...@gmail.com >> Subject: Fwd: Slack for PySpark users >> >>  >> >> >> -- Forwarded message - >> From: asma zgolli mailto:zgollia...@gmail.com>> >> Date: Tue,

Re: Slack for PySpark users

2023-03-27 Thread asma zgolli
Tuesday, March 28, 2023 9:43:08 AM > *To:* Hyukjin Kwon > *Cc:* keen ; user@spark.apache.org > *Subject:* Re: Slack for PySpark users > > +1 I think this is a great idea! > > On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon wrote: > > Yeah, actually I think we should be

Re: Slack for PySpark users

2023-03-27 Thread Winston Lai
Please let us know when the channel is created. I'd like to join :) Thank You & Best Regards Winston Lai From: Denny Lee Sent: Tuesday, March 28, 2023 9:43:08 AM To: Hyukjin Kwon Cc: keen ; user@spark.apache.org Subject: Re: Slack for PySpark users +1 I t

Re: Slack for PySpark users

2023-03-27 Thread Denny Lee
+1 I think this is a great idea! On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon wrote: > Yeah, actually I think we should better have a slack channel so we can > easily discuss with users and developers. > > On Tue, 28 Mar 2023 at 03:08, keen wrote: > >> Hi all, >> I really like *Slack *as

Re: Slack for PySpark users

2023-03-27 Thread Hyukjin Kwon
Yeah, actually I think we should better have a slack channel so we can easily discuss with users and developers. On Tue, 28 Mar 2023 at 03:08, keen wrote: > Hi all, > I really like *Slack *as communication channel for a tech community. > There is a Slack workspace for *delta lake users* ( >

Re: Question related to asynchronously map transformation using java spark structured streaming

2023-03-26 Thread Mich Talebzadeh
Agreed. How does asynchronous communication relate to Spark Structured streaming? In the previous post of yours, you made your Spark to run on the driver in a single JVM. You attempted to increase the number of executors to 3 after submission of the job that (as Sean alluded to) would not

Re: Question related to asynchronously map transformation using java spark structured streaming

2023-03-26 Thread Sean Owen
What do you mean by asynchronously here? On Sun, Mar 26, 2023, 10:22 AM Emmanouil Kritharakis < kritharakismano...@gmail.com> wrote: > Hello again, > > Do we have any news for the above question? > I would really appreciate it. > > Thank you, > >

Re: Question related to asynchronously map transformation using java spark structured streaming

2023-03-26 Thread Emmanouil Kritharakis
Hello again, Do we have any news for the above question? I would really appreciate it. Thank you, -- Emmanouil (Manos) Kritharakis Ph.D. candidate in the Department of Computer Science

Re: Kind help request

2023-03-25 Thread Sean Owen
It is telling you that the UI can't bind to any port. I presume that's because of container restrictions? If you don't want the UI at all, just set spark.ui.enabled to false On Sat, Mar 25, 2023 at 8:28 AM Lorenzo Ferrando < lorenzo.ferra...@edu.unige.it> wrote: > Dear Spark team, > > I am

Re: Adding OpenSearch as a secondary index provider to SparkSQL

2023-03-24 Thread Mich Talebzadeh
Hi, Are you talking about intelligent index scan here? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Question related to parallelism using structed streaming parallelism

2023-03-21 Thread Mich Talebzadeh
or download it from here https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Question related to parallelism using structed streaming parallelism

2023-03-21 Thread Sean Owen
Yes more specifically, you can't ask for executors once the app starts, in SparkConf like that. You set this when you launch it against a Spark cluster in spark-submit or otherwise. On Tue, Mar 21, 2023 at 4:23 AM Mich Talebzadeh wrote: > Hi Emmanouil, > > This means that your job is running on

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-17 Thread karan alang
Hi Mich, I'm currently testing this on my mac .. are you able to reproduce this issue ? Note - the code is similar .. except outputMode is set to update. wrt outputMode - when using aggregation + watermark, the outputMode should be either append Or update, in your code - you have used 'complete'

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-17 Thread Mich Talebzadeh
Hi Karan, The version tested was 3.1.1. Are you running on Dataproc serverless 3.1.3? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-16 Thread karan alang
Fyi .. apache spark version is 3.1.3 On Wed, Mar 15, 2023 at 4:34 PM karan alang wrote: > Hi Mich, this doesn't seem to be working for me .. the watermark seems to > be getting ignored ! > > Here is the data put into Kafka : > > ``` > > >

Re: Understanding executor memory behavior

2023-03-16 Thread Sean Owen
All else equal it is better to have the same resources in fewer executors. More tasks are local to other tasks which helps perf. There is more possibility of 'borrowing' extra mem and CPU in a task. On Thu, Mar 16, 2023, 2:14 PM Nikhil Goyal wrote: > Hi folks, > I am trying to understand what

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
What we can do is get into the habit of compiling the list on LinkedIn but making sure this list is shared and broadcast here, eh?! As well, when we broadcast the videos, we can do this using zoom/jitsi/ riverside.fm as well as simulcasting this on LinkedIn. This way you can view directly on the

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-15 Thread karan alang
Hi Mich, this doesn't seem to be working for me .. the watermark seems to be getting ignored ! Here is the data put into Kafka : ``` +---++ |value |key |

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Understood Nitin It would be wrong to act against one's conviction. I am sure we can find a way around providing the contents Regards Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Hi Nitin, Linkedin is more of a professional media. FYI, I am only a member of Linkedin, no facebook, etc.There is no reason for you NOT to create a profile for yourself in linkedin :) https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en see you there as

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Bjørn Jørgensen
Great. A case that I hope can be better documented, especially now that we have Pandas API on Spark and many potential new users coming from Pandas. Is how to start Spark with full available memory and CPU. I use this function to do this in a notebook. import multiprocessing import os import sys

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
Thanks Mich for tackling this! I encourage everyone to add to the list so we can have a comprehensive list of topics, eh?! On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh wrote: > Hi all, > > Thanks to @Denny Lee to give access to > > https://www.linkedin.com/company/apachespark/ > > and

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Hi all, Thanks to @Denny Lee to give access to https://www.linkedin.com/company/apachespark/ and contribution from @asma zgolli You will see my post at the bottom. Please add anything else on topics to the list as a comment. We will then put them together in an article perhaps. Comments

Re: logging pickle files on local run of spark.ml Pipeline model

2023-03-15 Thread Sean Owen
Pickle won't work. But the others should. I think you are specifying an invalid path in both cases but hard to say without more detail On Wed, Mar 15, 2023, 9:13 AM Mnisi, Caleb wrote: > Good Day > > > > I am having trouble saving a spark.ml Pipeline model to a pickle file, > when running

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
In spark structured streaming we cannot perform repartition() without stopping the streaming process unless otherwise. Admittedly, It is not a parameter that I have played around with. I still think Spark GUI should provide some insight. view my Linkedin profile

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Sean Owen
That's incorrect, it's spark.default.parallelism, but as the name suggests, that is merely a default. You control partitioning directly with .repartition() On Tue, Mar 14, 2023 at 11:37 AM Mich Talebzadeh wrote: > Check this link > > >

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
Check this link https://sparkbyexamples.com/spark/difference-between-spark-sql-shuffle-partitions-and-spark-default-parallelism/ You can set it spark.conf.set("sparkDefaultParallelism", value]) Have a look at Streaming statistics in Spark GUI, especially *Processing Tim*e, defined by

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Sean Owen
Are you just looking for DataFrame.repartition()? On Tue, Mar 14, 2023 at 10:57 AM Emmanouil Kritharakis < kritharakismano...@gmail.com> wrote: > Hello, > > I hope this email finds you well! > > I have a simple dataflow in which I read from a kafka topic, perform a map > transformation and then

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
What benefits are you going with increasing parallelism? Better througput view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss,

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Mich Talebzadeh
Hi Denny, That Apache Spark Linkedin page https://www.linkedin.com/company/apachespark/ looks fine. It also allows a wider audience to benefit from it. +1 for me view my Linkedin profile

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-14 Thread Gary Liu
Hi Mich, The y-axis is the number of executors. The code ran on dataproc serverless spark on 3.3.2. I tried closing autoscaling by setting the following: spark.dynamicAllocation.enabled=false spark.executor.instances=60 And still got the FetchFailedException error. I Wonder why it can run

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee
In the past, we've been using the Apache Spark LinkedIn page and group to broadcast these type of events - if you're cool with this? Or we could go through the process of submitting and updating the current https://spark.apache.org or request to

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Joris Billen
This is a very good idea-would love to read such a confluence page. Adding a section “common mistakes/misconceptions” might be useful for many of these sections. It would describe undesired behaviour/errors one would get in case of not following some best practices. On 13 Mar 2023, at 17:20,

Re: Spark 3.3.2 not running with Antlr4 runtime latest version

2023-03-14 Thread yangjie01
, because Spark still needs to support Java 8. Yang Jie 发件人: Sean Owen 日期: 2023年3月14日 星期二 21:33 收件人: "karuna.s...@accenture.com" 抄送: "user@spark.apache.org" , "Misra Parashar, Jyoti" , "Ratra, Neelima" , "Jain, Neha T." , "Geo

Re: Spark 3.3.2 not running with Antlr4 runtime latest version

2023-03-14 Thread Sean Owen
You want Antlr 3 and Spark is on 4? no I don't think Spark would downgrade. You can shade your app's dependencies maybe. On Tue, Mar 14, 2023 at 8:21 AM Sahu, Karuna wrote: > Hi Team > > > > We are upgrading a legacy application using Spring boot , Spark and > Hibernate. While upgrading

Re: spark on k8s daemonset collect log

2023-03-14 Thread Cheng Pan
The filebeat supports multiline matching, here is an example[1] BTW, I’m working on External Log Service integration[2], it may be useful in your case, feel free to review/left comments [1] https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html#multiline [2]

Re: Topics for Spark online classes & webinars

2023-03-13 Thread Mich Talebzadeh
Well that needs to be created first for this purpose. The appropriate name etc. to be decided. Maybe @Denny Lee can facilitate this as he offered his help. cheers view my Linkedin profile

Re: Topics for Spark online classes & webinars

2023-03-13 Thread asma zgolli
Hello Mich, Can you please provide the link for the confluence page? Many thanks Asma Ph.D. in Big Data - Applied Machine Learning Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh a écrit : > Apologies I missed the list. > > To move forward I selected these topics from the thread "Online classes

Re: Topics for Spark online classes & webinars

2023-03-13 Thread Mich Talebzadeh
Apologies I missed the list. To move forward I selected these topics from the thread "Online classes for spark topics". To take this further I propose a confluence page to be seup. 1. Spark UI 2. Dynamic allocation 3. Tuning of jobs 4. Collecting spark metrics for monitoring and

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Mich Talebzadeh
Hi Gary Thanks for the update. So this serverless dataproc. on 3.3.1. Maybe an autoscaling policy could be an option. What is y-axis? Is that the capacity? Can you break down the join into multiple parts and save the intermediate result set? HTH view my Linkedin profile

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Gary Liu
Hi Mich, I used the serverless spark session, not the local mode in the notebook. So machine type does not matter in this case. Below is the chart for serverless spark session execution. I also tried to increase executor memory and core, but the issue did got get resolved. I will try shutting down

Re: Online classes for spark topics

2023-03-12 Thread vaquar khan
any webinar on Spark related topic is appreciated  >>>> >>>> Thank You & Best Regards >>>> Winston Lai >>>> -- >>>> *From:* asma zgolli >>>> *Sent:* Thursday, March 9, 2023 5:43:06 AM >>>

Re: Online classes for spark topics

2023-03-12 Thread Mich Talebzadeh
- >>> *From:* asma zgolli >>> *Sent:* Thursday, March 9, 2023 5:43:06 AM >>> *To:* karan alang >>> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com >>> ; User >>> *Subject:* Re: Online classes for spark topics >>> >>> +1

Re: Online classes for spark topics

2023-03-12 Thread Denny Lee
t;> *To:* karan alang >> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com < >> ashok34...@yahoo.com>; User >> *Subject:* Re: Online classes for spark topics >> >> +1 >> >> Le mer. 8 mars 2023 à 21:32, karan alang a >> écrit : >> >> +1 .. I'm ha

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-12 Thread Mich Talebzadeh
OK ts is the timestamp right? This is a similar code that works out the average temperature with time frame of 5 minutes. Note the comments and catch error with try: try: # construct a streaming dataframe streamingDataFrame that subscribes to topic temperature

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread sam smith
" In this case your program may work because effectively you are not using the spark in yarn on the hadoop cluster " I am actually using Yarn as mentioned (client mode) I already know that, but it is not just about collectAsList, the execution freezes also for example when using save() on the

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread Mich Talebzadeh
collectAsList brings all the data into the driver which is a single JVM on a single node. In this case your program may work because effectively you are not using the spark in yarn on the hadoop cluster. The benefit of Spark is that you can process a large amount of data using the memory and

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread sam smith
not sure what you mean by your question, but it is not helping in any case Le sam. 11 mars 2023 à 19:54, Mich Talebzadeh a écrit : > > > ... To note that if I execute collectAsList on the dataset at the > beginning of the program > > What do you think collectAsList does? > > > >view

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread Mich Talebzadeh
... To note that if I execute collectAsList on the dataset at the beginning of the program What do you think collectAsList does? view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread karan alang
Hi Mich - Here is the output of the ldf.printSchema() & ldf.show() commands. ldf.printSchema() root |-- applianceName: string (nullable = true) |-- timeslot: long (nullable = true) |-- customer: string (nullable = true) |-- window: struct (nullable = false) ||-- start: timestamp

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread Mich Talebzadeh
Just looking at the code in here ldf = ldf.groupBy("applianceName", "timeslot", "customer", window(col("ts"), "15 minutes")) \ .agg({'sentOctets':"sum", 'recvdOctets':"sum"}) \ .withColumnRenamed('sum(sentOctets)', 'sentOctets') \

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Mich Talebzadeh
for your dataproc what type of machines are you using for example n2-standard-4 with 4vCPU and 16GB or something else? how many nodes and if autoscaling turned on. most likely executor memory limit? HTH view my Linkedin profile

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-03-09 Thread hueiyuan su
Dear Mich, Sure, that is a good idea. If we have a pause() function, we can temporarily stop streaming and adjust configuration, maybe from environment variable. Once these parameters are adjust, we can restart the streaming to apply the newest parameter without stop spark streaming application.

Re: How to share a dataset file across nodes

2023-03-09 Thread Mich Talebzadeh
Try something like below 1) Put your csv say cities.csv in HDFS as below hdfs dfs -put cities.csv /data/stg/test 2) Read it into dataframe in PySpark as below csv_file="hdfs://:PORT/data/stg/test/cities.csv" # read it in spark listing_df =

Re: How to share a dataset file across nodes

2023-03-09 Thread Sean Owen
Put the file on HDFS, if you have a Hadoop cluster? On Thu, Mar 9, 2023 at 3:02 PM sam smith wrote: > Hello, > > I use Yarn client mode to submit my driver program to Hadoop, the dataset > I load is from the local file system, when i invoke load("file://path") > Spark complains about the csv

Re: read a binary file and save in another location

2023-03-09 Thread Russell Jurney
Yeah, that's the right answer! Thanks, Russell Jurney @rjurney russell.jur...@gmail.com LI FB datasyndrome.com Book a time on Calendly On Thu, Mar 9,

Re: read a binary file and save in another location

2023-03-09 Thread Mich Talebzadeh
Does this need any action in PySpark? How about importing using the shutil package? https://sparkbyexamples.com/python/how-to-copy-files-in-python/ view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: read a binary file and save in another location

2023-03-09 Thread Russell Jurney
https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html This says "Binary file data source does not support writing a DataFrame back to the original files." which I take to mean this isn't possible... I haven't done this, but going from the docs, it would be:

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-03-09 Thread Mich Talebzadeh
most probably we will require an additional method pause() https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.streaming.StreamingQuery.html to allow us to pause (as opposed to stop()) the streaming process and resume after changing the parameters. The state of streaming

Re: Online classes for spark topics

2023-03-09 Thread neeraj bhadani
li > *Sent:* Thursday, March 9, 2023 5:43:06 AM > *To:* karan alang > *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com < > ashok34...@yahoo.com>; User > *Subject:* Re: Online classes for spark topics > > +1 > > Le mer. 8 mars 2023 à 21:32, karan alang a écrit : >

Re: [EXTERNAL] Spark Thrift Server - Autoscaling on K8

2023-03-09 Thread Saurabh Gulati
Hey Jayabindu, We use thriftserver for on K8S. May I ask why you are not going for Trino instead? I know it didn't support autoscaling when we tested it in the past but not sure if it does now. Autoscaling also means that users might have to wait for the cluster to autoscale but that usually

Re: [EXTERNAL] Re: Online classes for spark topics

2023-03-09 Thread asma zgolli
Sharma < > deepakmc...@gmail.com> > *Cc:* Denny Lee ; Sofia’s World < > mmistr...@gmail.com>; User ; Winston Lai < > weiruanl...@gmail.com>; ashok34...@yahoo.com ; asma > zgolli ; karan alang > *Subject:* Re: [EXTERNAL] Re: Online classes for spark topics > >

Re: [EXTERNAL] Re: Online classes for spark topics

2023-03-09 Thread Winston Lai
Lai From: Saurabh Gulati Sent: Thursday, March 9, 2023 5:04:35 PM To: Mich Talebzadeh ; Deepak Sharma Cc: Denny Lee ; Sofia’s World ; User ; Winston Lai ; ashok34...@yahoo.com ; asma zgolli ; karan alang Subject: Re: [EXTERNAL] Re: Online classes for spark topics Hey guys, Its a nice

Re: [EXTERNAL] Re: Online classes for spark topics

2023-03-09 Thread Saurabh Gulati
: Mich Talebzadeh Sent: 09 March 2023 09:00 To: Deepak Sharma Cc: Denny Lee ; Sofia’s World ; User ; Winston Lai ; ashok34...@yahoo.com ; asma zgolli ; karan alang Subject: [EXTERNAL] Re: Online classes for spark topics Caution! This email originated outside of FedEx. Please do not open

Re: Online classes for spark topics

2023-03-09 Thread Mich Talebzadeh
iated  >>>> >>>> Thank You & Best Regards >>>> Winston Lai >>>> -- >>>> *From:* asma zgolli >>>> *Sent:* Thursday, March 9, 2023 5:43:06 AM >>>> *To:* karan alang >>>

Re: Online classes for spark topics

2023-03-08 Thread Deepak Sharma
>>> *From:* asma zgolli >>> *Sent:* Thursday, March 9, 2023 5:43:06 AM >>> *To:* karan alang >>> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com >>> ; User >>> *Subject:* Re: Online classes for spark topics >>> >>> +1 >&

Re: Online classes for spark topics

2023-03-08 Thread Denny Lee
nk You & Best Regards >> Winston Lai >> -- >> *From:* asma zgolli >> *Sent:* Thursday, March 9, 2023 5:43:06 AM >> *To:* karan alang >> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com < >> ashok34...@yahoo.com>; User >> *Subject

Re: Online classes for spark topics

2023-03-08 Thread Sofia’s World
:* karan alang > *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com < > ashok34...@yahoo.com>; User > *Subject:* Re: Online classes for spark topics > > +1 > > Le mer. 8 mars 2023 à 21:32, karan alang a écrit : > > +1 .. I'm happy to be part of these discussions as wel

Re: Online classes for spark topics

2023-03-08 Thread Winston Lai
+1, any webinar on Spark related topic is appreciated  Thank You & Best Regards Winston Lai From: asma zgolli Sent: Thursday, March 9, 2023 5:43:06 AM To: karan alang Cc: Mich Talebzadeh ; ashok34...@yahoo.com ; User Subject: Re: Online classes for s

Re: Online classes for spark topics

2023-03-08 Thread asma zgolli
+1 Le mer. 8 mars 2023 à 21:32, karan alang a écrit : > +1 .. I'm happy to be part of these discussions as well ! > > > > > On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh > wrote: > >> Hi, >> >> I guess I can schedule this work over a course of time. I for myself can >> contribute plus learn

Re: Online classes for spark topics

2023-03-08 Thread karan alang
+1 .. I'm happy to be part of these discussions as well ! On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh wrote: > Hi, > > I guess I can schedule this work over a course of time. I for myself can > contribute plus learn from others. > > So +1 for me. > > Let us see if anyone else is

Re: Online classes for spark topics

2023-03-08 Thread Mich Talebzadeh
Hi, I guess I can schedule this work over a course of time. I for myself can contribute plus learn from others. So +1 for me. Let us see if anyone else is interested. HTH view my Linkedin profile

Re: Online classes for spark topics

2023-03-08 Thread ashok34...@yahoo.com.INVALID
Hello Mich. Greetings. Would you be able to arrange for Spark Structured Streaming learning webinar.? This is something I haven been struggling with recently. it will be very helpful. Thanks and Regard AKOn Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh wrote: Hi, This might 

Re: Online classes for spark topics

2023-03-07 Thread Mich Talebzadeh
Hi, This might be a worthwhile exercise on the assumption that the contributors will find the time and bandwidth to chip in so to speak. I am sure there are many but on top of my head I can think of Holden Karau for k8s, and Sean Owen for data science stuff. They are both very experienced.

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-03-07 Thread Mich Talebzadeh
hm interesting proposition. I guess you mean altering one of following parameters in flight streamingDataFrame = self.spark \ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", config['MDVariables']['bootstrapServers'],)

Re: 回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread Tufan Rakshit
re-optimized JVM libraries. >> >> On 3/7/23 8:21 AM, ckgppl_...@sina.cn wrote: >> >> No. I haven't installed Apple Developer Tools. I have installed Zulu >> OpenJDK 11.0.17 manually. >> So I need to install Apple Developer Tools? >> - 原始邮件 - >&g

Re: 回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread Sean Owen
I need to install Apple Developer Tools? > - 原始邮件 - > 发件人:Sean Owen > 收件人:ckgppl_...@sina.cn > 抄送人:user > 主题:Re: Build SPARK from source with SBT failed > 日期:2023年03月07日 20点58分 > > This says you don't have the java compiler installed. Did you install the > Apple

Re: 回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread Artemis User
Apple Developer Tools. I have installed Zulu OpenJDK 11.0.17 manually. So I need to install Apple Developer Tools? - 原始邮件 - 发件人:Sean Owen 收件人:ckgppl_...@sina.cn 抄送人:user 主题:Re: Build SPARK from source with SBT failed 日期:2023年03月07日 20点58分 This says you don't have the java compiler installed

回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread ckgppl_yan
No. I haven't installed Apple Developer Tools. I have installed Zulu OpenJDK 11.0.17 manually.So I need to install Apple Developer Tools?- 原始邮件 - 发件人:Sean Owen 收件人:ckgppl_...@sina.cn 抄送人:user 主题:Re: Build SPARK from source with SBT failed 日期:2023年03月07日 20点58分 This says you don't have

Re: Pandas UDFs vs Inbuilt pyspark functions

2023-03-07 Thread Sean Owen
It's hard to evaluate without knowing what you're doing. Generally, using a built-in function will be fastest. pandas UDFs can be faster than normal UDFs if you can take advantage of processing multiple rows at once. On Tue, Mar 7, 2023 at 6:47 AM neha garde wrote: > Hello All, > > I need help

Re: Build SPARK from source with SBT failed

2023-03-07 Thread Sean Owen
This says you don't have the java compiler installed. Did you install the Apple Developer Tools package? On Tue, Mar 7, 2023 at 1:42 AM wrote: > Hello, > > I have tried to build SPARK source codes with SBT in my local dev > environment (MacOS 13.2.1). But it reported following error: > [error]

Re: [Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently and how to handle if achieve quotas of kinesis?

2023-03-06 Thread Mich Talebzadeh
Spark Structured Streaming can write to anything as long as an appropriate API or JDBC connection exists. I have not tried Kinesis but have you thought about how you want to write it as a Sync? Those quota limitations, much like quotas set by the vendors (say Google on BigQuery writes etc) are

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-05 Thread Mich Talebzadeh
OK I found a workaround. Basically each stream state is not kept and I have two streams. One is a business topic and the other one created to shut down spark structured streaming gracefully. I was interested to print the value for the most recent batch Id for the business topic called "md" here

Re: Unable to handle bignumeric datatype in spark/pyspark

2023-03-04 Thread Atheeth SH
Hi Rajnil, Sorry for the multiple emails. It seems you are getting the ModuleNotFoundError error was curious, have you tried using the below-mentioned solution mentioned in the readme file? Below is the link:- https://github.com/GoogleCloudDataproc/spark-bigquery-connector#bignumeric-support

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This might help https://docs.databricks.com/structured-streaming/foreach.html streamingDF.writeStream.foreachBatch(...) allows you to specify a function that is executed on the output data of every micro-batch of the streaming query. It takes two parameters: a DataFrame or Dataset that has the

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
I am aware of your point that global don't work in a distributed environment. With regard to your other point, these are two different topics with their own streams. The point of second stream is to set the status to false, so it can gracefully shutdown the main stream (the one called "md") here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
I don't quite get it - aren't you applying to the same stream, and batches? worst case why not apply these as one function? Otherwise, how do you mean to associate one call to another? globals don't help here. They aren't global beyond the driver, and, which one would be which batch? On Sat, Mar

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Thanks. they are different batchIds >From sendToControl, newtopic batchId is 76 >From sendToSink, md, batchId is 563 As a matter of interest, why does a global variable not work? view my Linkedin profile

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
It's the same batch ID already, no? Or why not simply put the logic of both in one function? or write one function that calls both? On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh wrote: > > This is probably pretty straight forward but somehow is does not look > that way > > > > On Spark

Re: SPIP architecture diagrams

2023-03-04 Thread Mich Talebzadeh
ok I decided to bite the bullet and use a Visio diagram for my SPIP "Shutting down spark structured streaming when the streaming process completed the current process". Details from here https://issues.apache.org/jira/browse/SPARK-42485 This is not meant to be complete. In this an indication. I

Re: Unable to handle bignumeric datatype in spark/pyspark

2023-03-03 Thread Atheeth SH
Hi Rajnil, Just curious, what version of spark-bigquery-connector are you using? Thanks, Atheeth On Sat, 25 Feb 2023 at 23:48, Mich Talebzadeh wrote: > sounds like it is cosmetric. The important point is that if the data > stored in GBQ is valid? > > > THT > > >view my Linkedin profile >

Re: Unsubscribe

2023-03-03 Thread Atheeth SH
please send an empty email to: user-unsubscr...@spark.apache.org to unsubscribe yourself from the list. Thanks On Thu, 23 Feb 2023 at 07:07, Tang Jinxin wrote: > Unsubscribe >

Re: unsubscribe

2023-03-03 Thread Atheeth SH
please send an empty email to: user-unsubscr...@spark.apache.org to unsubscribe yourself from the list. Thanks, Atheeth On Fri, 24 Feb 2023 at 03:58, Roberto Jr wrote: > please unsubscribe from that email list. > thank you in advance. > roberto. >

Re: [New Project] sparksql-ml : Distributed Machine Learning using SparkSQL.

2023-02-27 Thread Russell Jurney
I think it is awesome. Brilliant interface that is missing from Spark. Would you integrate with something like MLFlow? Thanks, Russell Jurney @rjurney russell.jur...@gmail.com LI FB datasyndrome.com

Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread Mich Talebzadeh
Hi Murat, I have dealt with EMR but have used Spark cluster on Google Dataproc with 3.1.1 with autoscaling policy. My understanding is that autoscaling policy will decide on how to scale if needed without manual intervention. Is this the case with yours? HTH view my Linkedin profile

Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread murat migdisoglu
Hey Mich, This cluster is running spark 2.4.6 on EMR On Mon, Feb 27, 2023 at 12:20 PM Mich Talebzadeh wrote: > Hi, > > What is the spark version and what type of cluster is it, spark on > dataproc or other? > > HTH > > > >view my Linkedin profile >

<    4   5   6   7   8   9   10   11   12   13   >