Re: Can JVisual VM monitoring tool be used to Monitor Spark Executor Memory and CPU

2021-03-21 Thread Attila Zsolt Piros
Hi Ranju! I am quite sure for your requirement "monitor every component and isolate the resources consuming individually by every component" Spark metrics is the right direction to go. > Why only UsedstorageMemory should be checked? Right, for you only storage memory won't be enough you need

[Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-21 Thread Gaurav Singh
Hi Team, We have lots of complex oracle views ( containing multiple tables, joins, analytical and aggregate functions, sub queries etc) and we are wondering if Spark can help us execute those views faster. Also we want to know if those complex views can be implemented using Spark SQL? Thanks

Re: Spark version verification

2021-03-21 Thread Kent Yao
Hi Mich,> What are the correlations among these links and the ability to establish a spark build version   Check the documentation list here, http://spark.apache.org/documentation.html . And the `latest` always points to the list head, for example 

Re: Spark version verification

2021-03-21 Thread Attila Zsolt Piros
Hi! Thanks Sean and Kent! By reading your answers I have also learnt something new. @Mich Talebzadeh : see the commit content by prefixing it with *https://github.com/apache/spark/commit/ *. So in your case

Re: Spark version verification

2021-03-21 Thread Mich Talebzadeh
Hi Kent, Thanks for the links. You have to excuse my ignorance, what are the correlations among these links and the ability to establish a spark build version? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any

Re: Spark version verification

2021-03-21 Thread Kent Yao
Please refer to http://spark.apache.org/docs/latest/api/sql/index.html#version  Kent Yao @ Data Science Center, Hangzhou Research Institute,

Re: In built Optimizer on Spark

2021-03-21 Thread Mich Talebzadeh
Hi Felix, As you may be aware Spark sql does have a catalyst optimiser. What is the Catalyst Optimizer? - Databricks You Mentioned Spark Structured Streaming. What specifics are you looking for? Have you considered the Spark GUI streaming

Re: Spark version verification

2021-03-21 Thread Mich Talebzadeh
Many thanks spark-sql> SELECT version(); 3.1.1 1d550c4e90275ab418b9161925049239227f3dc9 What does 1d550c4e90275ab418b9161925049239227f3dc9 signify please? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and

Re: Spark version verification

2021-03-21 Thread Sean Owen
I believe you can "SELECT version()" in Spark SQL to see the build version. On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh wrote: > Thanks for the detailed info. > > I was hoping that one can find a simpler answer to the Spark version than > doing forensic examination on base code so to speak.

Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread ranju goel
Hi Attila, I will check why INVALID is getting appended in mailing address. What is your use case here? Client Driver Application not using collect but internally calling python script which is reading part files records [comma separated string] of each cluster separately and copying

RE: Can JVisual VM monitoring tool be used to Monitor Spark Executor Memory and CPU

2021-03-21 Thread Ranju Jain
Hi Mich/Attila, @Mich Talebzadeh: I considered spark GUI , but I have a confusion first at memory level. App Configuration: spark.executor.memory= 4g for running spark job. In spark GUI I see running spark job has Peak Execution Memory is 1 KB as highlighted

In built Optimizer on Spark

2021-03-21 Thread Felix Kizhakkel Jose
Hello, Is there any in-built optimizer in Spark as in Flink, to avoid manual configuration tuning to achieve better performance of your structured streaming pipeline? Or is there any work happening to achieve this? Regards, Felix K Jose

Re: Spark version verification

2021-03-21 Thread Mich Talebzadeh
Thanks for the detailed info. I was hoping that one can find a simpler answer to the Spark version than doing forensic examination on base code so to speak. The primer for this verification is that on GCP dataprocs originally built on 3.11-rc2, there was an issue with running Spark Structured

RE: Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread Ranju Jain
Hi Attila, What is your use case here? Client Driver Application not using collect but internally calling python script which is reading part files records [comma separated string] of each cluster separately and copying records in other final csv file, so merging all part files data in single

Re: Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread Attila Zsolt Piros
Hi! I would like to reflect only to the first part of your mail: I have a large RDD dataset of around 60-70 GB which I cannot send to driver > using *collect* so first writing that to disk using *saveAsTextFile* and > then this data gets saved in the form of multiple part files on each node >