This is scala you are looking to learn. I would suggest you to look at this: http://www.brunton-spall.co.uk/post/2011/12/02/map-map-and-flatmap-in-scala/
Mayur Rustagi Ph: +919632149971 h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 4, 2014 at 8:27 PM, goi cto <[email protected]> wrote: > Thanks all for the info provided. > One of the things I noticed is that both Map and Reduce functions receives > a function which is used on all objects > (map : Return a new distributed dataset formed by passing each element of > the source through a function *func*) > > Q1: all the examples I seen so far have a very simple function as func e.g > (line => line.split(" ")) any examples or cases were a more complex > function is needed? if so, what is the syntax for this? > You can create your own functions (complex ones ) and call them here in map (line =>customFunc(line)) since two elements of the list cannot interact in map nothing more complex is possible. > > Q2: what is the difference between Map and flatMap? when should I use which? > Use flatmap when your map converts each element into a array and you are looking to flatten all the arrays into a single array. > > Q3: reduceByKey((a, b) => a + b) -> Here again, this example was used in > the word count sample. I understand it takes the value argument of the K,V > pair and preform the function on them. e.g. + but what does the a , b > represent? what if my value is not an integer? > + is overloaded in most cases.. so in String it'll mean append, in set it will mean union etc etc. In you custom class you can overload it your way. > > Thanks > Eran > > > > On Tue, Feb 4, 2014 at 1:19 PM, Akhil Das <[email protected]> wrote: > >> From the Spark download page, you may download a prebuilt package. If you >> download source package, build it against the hadoop version that you have. >> >> You can open Spark's interactive shell in standalone local mode like >> this, by issuing *./spark-shell *command inside Spark directory >> >> *akhld@akhldz:/data/spark-0.8.0$ ./spark-shell * >> >> Now You can run a word count example in the shell, taking input from hdfs >> and writing output back to hdfs*.* >> >> *scala> var file = >> sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")* >> >> *scala> var count = file.flatMap(line => line.split(" ")).map(word => >> (word, 1)).reduceByKey(_+_)* >> >> *scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")* >> >> *You may find similar information over here * >> http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html >> >> >> On Tue, Feb 4, 2014 at 4:45 PM, goi cto <[email protected]> wrote: >> >>> Hi, >>> I am a newbie with spark and scala and trying to get around. >>> I am looking for resources to learn more (if possible, by example) on >>> how to program with map & reduce functions. >>> Any good recommendations? >>> (I did the getting started guides on the site but still don't feel >>> comfortable with that)... >>> >>> -- >>> Eran | CTO >>> >> >> >> >> -- >> Thanks >> Best Regards >> > > > > -- > Eran | CTO >
