Re: Spark Doubts

2022-06-25 Thread russell . spitzer
Code is always distributed for any operations on a DataFrame or RDD. The size of your code is irrelevant except to Jvm memory limits. For most jobs the entire application jar and all dependencies are put on the classpath of every executor. There are some exceptions but generally you should

Re: Spark Doubts

2022-06-25 Thread Sid
Hi Tufan, Thanks for the answers. However, by the second point, I mean to say where would my code reside? Will it be copied to all the executors since the code size would be small or will it be maintained on the driver's side? I know that driver converts the code to DAG and when an action is

Re: Spark Doubts

2022-06-25 Thread Tufan Rakshit
Please find the answers inline please . 1) Can I apply predicate pushdown filters if I have data stored in S3 or it should be used only while reading from DBs? it can be applied in s3 if you store parquet , csv, json or in avro format .It does not depend on the DB , its supported in object store

Spark Doubts

2022-06-25 Thread Sid
Hi Team, I have various doubts as below: 1) Can I apply predicate pushdown filters if I have data stored in S3 or it should be used only while reading from DBs? 2) While running the data in distributed form, is my code copied to each and every executor. As per me, it should be the case since

Re: Spark Doubts

2022-06-22 Thread Sid
Hi, Thanks for your answers. Much appreciated I know that we can cache the data frame in memory or disk but I want to understand when the data frame is loaded initially and where does it reside by default? Thanks, Sid On Wed, Jun 22, 2022 at 6:10 AM Yong Walt wrote: > These are the basic

Re: Spark Doubts

2022-06-21 Thread Yong Walt
These are the basic concepts in spark :) You may take a bit time to read this small book: https://cloudcache.net/resume/PDDWS2-V2.pdf regards On Wed, Jun 22, 2022 at 3:17 AM Sid wrote: > Hi Team, > > I have a few doubts about the below questions: > > 1) data frame will reside where? memory?

Re: Spark Doubts

2022-06-21 Thread Apostolos N. Papadopoulos
Dear Sid. You are asking questions for which answers exist in the Apache Spark website or in books or in MOOCS or in other URLs. For example, take a look at this one: https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/

Spark Doubts

2022-06-21 Thread Sid
Hi Team, I have a few doubts about the below questions: 1) data frame will reside where? memory? disk? memory allocation about data frame? 2) How do you configure each partition? 3) Is there any way to calculate the exact partitions needed to load a specific file? Thanks, Sid