OK, Thanks. issue of load balancing /Clustering:
I believe if I setup clustering like so : sbin/start-master.sh sbin/start-slave spark://master:port *another machine* sbin/start-slave spark://master:port Does yarn and mesos do anything different than that ? The spark clustering setup and yarn and mesos, are they competing technologies for the same space / functionality ? Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org <http://www.backbutton.co.uk> On Fri, 27 Mar 2020 at 22:38, Sean Owen <sro...@gmail.com> wrote: > - dev@, which is more for project devs to communicate. Cross-posting > is discouraged too. > > The book isn't from the Spark OSS project, so not really the place to > give feedback here. > > I don't quite understand the context of your other questions, but > would elaborate them in individual, clear emails instead to increase > the chance that someone will answer. > > On Fri, Mar 27, 2020 at 4:49 PM Zahid Rahman <zahidr1...@gmail.com> wrote: > > > > > > I was very impressed with the amount of material available from > https://github.com/databricks/Spark-The-Definitive-Guide/ > > Over 450+ megabytes. > > > > I have a corrected the scala code by adding > > .sort(desc("sum(total_cost)")) to the code provided on page 34 (see > below). > > > > I have noticed numerous uses of exclamation marks almost over use. > > for example: > > page 23: Let's specify some more transformatrions ! > > page 24: you've read your first explain plan ! > > page 26: Notice that these plans compile to the exactsame underlying > plan ! > > page 29: The last step is our action ! > > page 34: The best thing about structured streaming ....rapidly... with > virtually no code > > > > 1. I have never read a science book with such emotion of frustration. > > Is Spark difficult to understand made more complicated with the > proliferation of languages > > scala , Java , python SQL R. > > > > 2. Secondly, Is spark architecture made more complex due to competing > technologies ? > > > > I have spark cluster setup with master and slave to load balancing heavy > activity like so: > > sbin/start-master.sh > > sbin/start-slave.sh spark://192.168.0.38:7077 > > for load balancing I imagine, conceptually speaking, although I haven't > tried it , I can have as many > > slaves(workers) on other physical machines by simply downloading spark > zip file > > and running workers from those other physical machine(s) with > sbin/start-slave.sh spark://192.168.0.38:7077. > > My question is under the circumstances do I need to bother with mesos or > yarn ? > > > > Collins dictionary > > The exclamation mark is used after exclamations and emphatic expressions. > > > > I can’t believe it! > > Oh, no! Look at this mess! > > > > The exclamation mark loses its effect if it is overused. It is better to > use a full stop after a sentence expressing mild excitement or humour. > > > > It was such a beautiful day. > > I felt like a perfect banana. > > > > > > import org.apache.spark.sql.SparkSession > > import org.apache.spark.sql.functions.{window,column,desc,col} > > > > object RetailData { > > > > def main(args: Array[String]): Unit = { > > > > val spark = > > SparkSession.builder().master("spark://192.168.0.38:7077").appName("Retail > Data").getOrCreate(); > > > > // create a static frame > > val staticDataFrame = spark.read.format("csv") > > .option ("header","true") > > .option("inferschema","true") > > .load("/data/retail-data/by-day/*.csv") > > > > staticDataFrame.createOrReplaceTempView("retail_data") > > val staticFrame = staticDataFrame.schema > > > > staticDataFrame > > .selectExpr( > > "CustomerId","UnitPrice * Quantity as total_cost", "InvoiceDate") > > .groupBy(col("CustomerId"), window(col("InvoiceDate"), "1 day")) > > .sum("total_cost") > > .sort(desc("sum(total_cost)")) > > .show(1) > > > > } // main > > > > } // object > > > > > > > > Backbutton.co.uk > > ¯\_(ツ)_/¯ > > ♡۶Java♡۶RMI ♡۶ > > Make Use Method {MUM} > > makeuse.org >