Re: Looking for resources on Map\Reduce concepts

Mayur Rustagi Thu, 06 Feb 2014 11:30:15 -0800

This is scala you are looking to learn. I would suggest you to look at
this:
http://www.brunton-spall.co.uk/post/2011/12/02/map-map-and-flatmap-in-scala/


Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Tue, Feb 4, 2014 at 8:27 PM, goi cto <[email protected]> wrote:

> Thanks all for the info provided.
> One of the things I noticed is that both Map and Reduce functions receives
> a function which is used on all objects
> (map : Return a new distributed dataset formed by passing each element of
> the source through a function *func*)
>
> Q1: all the examples I seen so far have a very simple function as func e.g
> (line => line.split(" ")) any examples or cases were a more complex
> function is needed? if so, what is the syntax for this?
>
You can create your own functions (complex ones ) and call them here in map
(line =>customFunc(line)) since two elements of the list cannot interact in
map nothing more complex is possible.

>
>
Q2: what is the difference between Map and flatMap? when should I use which?
>
Use flatmap when your map converts each element into a array and you are
looking to flatten all the arrays into a single array.

>
> Q3: reduceByKey((a, b) => a + b) -> Here again, this example was used in
> the word count sample. I understand it takes the value argument of the K,V
> pair and preform the function on them. e.g. +  but what does the a , b
> represent? what if my value is not an integer?
>
+ is overloaded in most cases.. so in String it'll mean append, in set it
will mean union etc etc. In you custom class you can overload it your way.

>
> Thanks
> Eran
>
>
>
> On Tue, Feb 4, 2014 at 1:19 PM, Akhil Das <[email protected]> wrote:
>
>> From the Spark download page, you may download a prebuilt package. If you
>> download source package, build it against the hadoop version that you have.
>>
>> You can open Spark's interactive shell in standalone local mode like
>> this, by issuing *./spark-shell *command inside Spark directory
>>
>> *akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *
>>
>> Now You can run a word count example in the shell, taking input from hdfs
>> and writing output back to hdfs*.*
>>
>>  *scala> var file =
>> sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*
>>
>> *scala> var count = file.flatMap(line => line.split(" ")).map(word =>
>> (word, 1)).reduceByKey(_+_)*
>>
>> *scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*
>>
>> *You may find similar information over here *
>> http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html
>>
>>
>> On Tue, Feb 4, 2014 at 4:45 PM, goi cto <[email protected]> wrote:
>>
>>> Hi,
>>> I am a newbie with spark and scala and trying to get around.
>>> I am looking for resources to learn more (if possible, by example) on
>>> how to program with map & reduce functions.
>>> Any good recommendations?
>>> (I did the getting started guides on the site but still don't feel
>>> comfortable with that)...
>>>
>>> --
>>> Eran | CTO
>>>
>>
>>
>>
>> --
>> Thanks
>> Best Regards
>>
>
>
>
> --
> Eran | CTO
>

Re: Looking for resources on Map\Reduce concepts

Reply via email to