subject:"Spark Interview questions"

Re: Spark Interview questions

2016-09-14 Thread Jacek Laskowski

Hi,

Doh, Mich, it's way too much to ask for "typical Spark interview
questions for Spark/Scala junior roles". There are plenty of such
questions and I don't think there's a way to have them all noted down.

Spark supports 5 languages, offers 4 modules + Core, and presents
itself differently to developers, admins and performance g33ks. With 3
supported cluster managers in and you see I'm staying far from such
questions. Too much to handle.

Pass.

p.s. The more I'm with Spark the more I'm overwhelmed how complex it
is. So many sections with FIXMEs/TODOs in my Spark notes...

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Wed, Sep 14, 2016 at 4:09 PM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> Hi Ashok,
>
> I am sure we all have some war stories some of which I recall:
>
> What is meant by RDD, DataFrame and Dataset
> What is the meant by "All transformations in Spark are lazy"?
> What are the two types of operations supported by RDD?
> What is meant by Spark running under a certain mode?
> Explain the difference between Spark Running in a Standalone mode and Yarn
> cluster mode
> What is the difference between Spark running in Yarn client mode and Yarn
> cluster mode.
> What is the difference between persist and cache
> If you cache a DataFrame what does it do and where is the memory consumed
> come from. Can you give a place where you can see its measurements
> What is meant by DAG? A broad outline
> What is shuffling in Spark. How can you minimise its impact
> How would you specify your spark hardware in a medium size set-up say 8 node
> cluster.
> How could one minimise the network latency within Spark and the underlying
> storage (assuming HDFS here)
> How can you parallelize your JDBC connection to a database say any RDBMS?
> How does it work
> What is the use case for Spark Thrift Server.
> How would you typically read and process a tab separated file into Spark
> If you have an OOM message in Spark how would you go about diagnosing the
> problem
> What is meant by spark-submit. How would you use it
> What is a Spark driver? If you run Spark in Local mode how many executors
> can you start
> What is meant by Spark Streaming. What is a use case example
> In Spark Streaming what parameters are important
> What are the typical analytic functions in Spark SQL
> What is the difference between RANK and DENSE_RANK
>
>
> I am sure there are many other questions that one think of. For example,
> someone like Jacek Laskowski can provide more programming questions as he is
> a professional Spark trainer :)
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 14 September 2016 at 12:35, Ashok Kumar <ashok34...@yahoo.com.invalid>
> wrote:
>>
>> Hi,
>>
>> As a learner I appreciate if you have typical Spark interview questions
>> for Spark/Scala junior roles that you can please forward to me.
>>
>> I will be very obliged
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark Interview questions

2016-09-14 Thread Mich Talebzadeh

Hi Ashok,

I am sure we all have some war stories some of which I recall:

   1. What is meant by RDD, DataFrame and Dataset
   2. What is the meant by "All transformations in Spark are lazy"?
   3. What are the two types of operations supported by RDD?
   4. What is meant by Spark running under a certain mode?
   5. Explain the difference between Spark Running in a Standalone mode and
   Yarn cluster mode
   6. What is the difference between Spark running in Yarn client mode and
   Yarn cluster mode.
   7. What is the difference between persist and cache
   8. If you cache a DataFrame what does it do and where is the memory
   consumed come from. Can you give a place where you can see its measurements
   9. What is meant by DAG? A broad outline
   10. What is shuffling in Spark. How can you minimise its impact
   11. How would you specify your spark hardware in a medium size set-up
   say 8 node cluster.
   12. How could one minimise the network latency within Spark and the
   underlying storage (assuming HDFS here)
   13. How can you parallelize your JDBC connection to a database say any
   RDBMS? How does it work
   14. What is the use case for Spark Thrift Server.
   15. How would you typically read and process a tab separated file into
   Spark
   16. If you have an OOM message in Spark how would you go about
   diagnosing the problem
   17. What is meant by spark-submit. How would you use it
   18. What is a Spark driver? If you run Spark in Local mode how many
   executors can you start
   19. What is meant by Spark Streaming. What is a use case example
   20. In Spark Streaming what parameters are important
   21. What are the typical analytic functions in Spark SQL
   22. What is the difference between RANK and DENSE_RANK

- I am sure there are many other questions that one think of. For example,
someone like Jacek Laskowski can provide more programming questions as he
is a professional Spark trainer :)

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 14 September 2016 at 12:35, Ashok Kumar <ashok34...@yahoo.com.invalid>
wrote:

> Hi,
>
> As a learner I appreciate if you have typical Spark interview questions
> for Spark/Scala junior roles that you can please forward to me.
>
> I will be very obliged
>

Spark Interview questions

2016-09-14 Thread Ashok Kumar

Hi,
As a learner I appreciate if you have typical Spark interview questions for 
Spark/Scala junior roles that you can please forward to me.
I will be very obliged

Re: Spark Interview Questions

2015-08-19 Thread Sandeep Giri

Thank you All. I have updated it to a little better version.


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com. http://KnowBigData.com.
Phone: +1-253-397-1945 (Office)

[image: linkedin icon] https://linkedin.com/company/knowbigdata [image:
other site icon] http://knowbigdata.com  [image: facebook icon]
https://facebook.com/knowbigdata [image: twitter icon]
https://twitter.com/IKnowBigData https://twitter.com/IKnowBigData


On Mon, Aug 17, 2015 at 7:10 PM, Sandeep Giri sand...@knowbigdata.com
wrote:

 This statement is from the Spark's website itself.


 Regards,
 Sandeep Giri,
 +1 347 781 4573 (US)
 +91-953-899-8962 (IN)

 www.KnowBigData.com. http://KnowBigData.com.
 Phone: +1-253-397-1945 (Office)

 [image: linkedin icon] https://linkedin.com/company/knowbigdata [image:
 other site icon] http://knowbigdata.com  [image: facebook icon]
 https://facebook.com/knowbigdata [image: twitter icon]
 https://twitter.com/IKnowBigData https://twitter.com/IKnowBigData


 On Wed, Aug 12, 2015 at 10:42 PM, Peyman Mohajerian mohaj...@gmail.com
 wrote:

 I think this statement is inaccurate:
 Q7: What are Actions? A: An action brings back the data from the RDD to
 the local machine -

 Also I wouldn't say Spark is 100x faster than Hadoop and it is memory
 based. This is the kind of statement that will not get you the job. When it
 comes to shuffle it has to write to disk, it is a faster in many cases but
 100x is just some marketing statement in a very narrow use cases.






 On Thu, Jul 30, 2015 at 4:55 AM, Sandeep Giri sand...@knowbigdata.com
 wrote:

 i have prepared some interview questions:
 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1
 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2

 please provide your feedback.

 On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez ski.rodrig...@gmail.com
 wrote:

 You might look at the edx course on Apache Spark or ML with Spark.
 There are probably some homework problems or quiz questions that might be
 relevant. I haven't looked at the course myself, but thats where I would go
 first.


 https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x

 https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x

 --
 Pedro Rodriguez
 PhD Student in Distributed Machine Learning | CU Boulder
 UC Berkeley AMPLab Alumni

 ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
 Github: github.com/EntilZha | LinkedIn:
 https://www.linkedin.com/in/pedrorodriguezscience

Re: Spark Interview Questions

2015-08-17 Thread Sandeep Giri

This statement is from the Spark's website itself.


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com. http://KnowBigData.com.
Phone: +1-253-397-1945 (Office)

[image: linkedin icon] https://linkedin.com/company/knowbigdata [image:
other site icon] http://knowbigdata.com  [image: facebook icon]
https://facebook.com/knowbigdata [image: twitter icon]
https://twitter.com/IKnowBigData https://twitter.com/IKnowBigData


On Wed, Aug 12, 2015 at 10:42 PM, Peyman Mohajerian mohaj...@gmail.com
wrote:

 I think this statement is inaccurate:
 Q7: What are Actions? A: An action brings back the data from the RDD to
 the local machine -

 Also I wouldn't say Spark is 100x faster than Hadoop and it is memory
 based. This is the kind of statement that will not get you the job. When it
 comes to shuffle it has to write to disk, it is a faster in many cases but
 100x is just some marketing statement in a very narrow use cases.






 On Thu, Jul 30, 2015 at 4:55 AM, Sandeep Giri sand...@knowbigdata.com
 wrote:

 i have prepared some interview questions:
 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1
 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2

 please provide your feedback.

 On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez ski.rodrig...@gmail.com
 wrote:

 You might look at the edx course on Apache Spark or ML with Spark. There
 are probably some homework problems or quiz questions that might be
 relevant. I haven't looked at the course myself, but thats where I would go
 first.


 https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x

 https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x

 --
 Pedro Rodriguez
 PhD Student in Distributed Machine Learning | CU Boulder
 UC Berkeley AMPLab Alumni

 ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
 Github: github.com/EntilZha | LinkedIn:
 https://www.linkedin.com/in/pedrorodriguezscience

Re: Spark Interview Questions

2015-07-30 Thread Sandeep Giri

i have prepared some interview questions:
http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1
http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2

please provide your feedback.

On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez ski.rodrig...@gmail.com wrote:

 You might look at the edx course on Apache Spark or ML with Spark. There
 are probably some homework problems or quiz questions that might be
 relevant. I haven't looked at the course myself, but thats where I would go
 first.


 https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x
 https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x

 --
 Pedro Rodriguez
 PhD Student in Distributed Machine Learning | CU Boulder
 UC Berkeley AMPLab Alumni

 ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
 Github: github.com/EntilZha | LinkedIn:
 https://www.linkedin.com/in/pedrorodriguezscience

Re: Spark Interview Questions

2015-07-29 Thread Pedro Rodriguez

You might look at the edx course on Apache Spark or ML with Spark. There
are probably some homework problems or quiz questions that might be
relevant. I haven't looked at the course myself, but thats where I would go
first.

https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x
https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x

--
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Spark Interview Questions

2015-07-29 Thread Mishra, Abhishek

Hello,

Please help me with links or some document for Apache Spark interview questions 
and answers. Also for the tools related to it ,for which questions could be 
asked.

Thanking you all.

Sincerely,
Abhishek

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Interview Questions

2015-07-29 Thread vaquar khan

Hi Abhishek,

Please  learn spark ,there are no shortcuts for sucess.

Regards,
Vaquar khan
On 29 Jul 2015 11:32, Mishra, Abhishek abhishek.mis...@xerox.com wrote:

 Hello,

 Please help me with links or some document for Apache Spark interview
 questions and answers. Also for the tools related to it ,for which
 questions could be asked.

 Thanking you all.

 Sincerely,
 Abhishek

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

RE: Spark Interview Questions

2015-07-29 Thread Mishra, Abhishek

Hello Vaquar,

I have working knowledge and experience in Spark. I just wanted to test or do a 
mock round to evaluate myself. Thank you for the reply,

Please share something if you have for the same.

Sincerely,
Abhishek

From: vaquar khan [mailto:vaquar.k...@gmail.com]
Sent: Wednesday, July 29, 2015 8:22 PM
To: Mishra, Abhishek
Cc: User
Subject: Re: Spark Interview Questions


Hi Abhishek,

Please  learn spark ,there are no shortcuts for sucess.

Regards,
Vaquar khan
On 29 Jul 2015 11:32, Mishra, Abhishek 
abhishek.mis...@xerox.commailto:abhishek.mis...@xerox.com wrote:
Hello,

Please help me with links or some document for Apache Spark interview questions 
and answers. Also for the tools related to it ,for which questions could be 
asked.

Thanking you all.

Sincerely,
Abhishek

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org

Re: Spark Interview questions

Re: Spark Interview questions

Spark Interview questions

Re: Spark Interview Questions

Re: Spark Interview Questions

Re: Spark Interview Questions

Re: Spark Interview Questions

Spark Interview Questions

Re: Spark Interview Questions

RE: Spark Interview Questions

10 matches

Site Navigation

Mail list logo

Footer information