Hi Steve,

The _ notation can be a bit confusing when starting with Scala, we can
rewrite it to avoid using it here. So instead of
val numUsers = ratings.map(_._2.user) we can write val numUsers =
ratings.map(x => x._2.user)

ratings is an Key-Value RDD (which is an RDD comprised of tuples) and so
when map over its we can access the key by calling _1 or the value with _2
and then we get call user on that to get back the user.

Does that make sense?

Cheers,

Holden


On Mon, Aug 4, 2014 at 3:17 PM, Steve Nunez <snu...@hortonworks.com> wrote:

> Can one of the Scala experts please explain this bit of pattern magic from
> the Spark ML tutorial: _._2.*user* ?
>
> As near as I can tell, this is applying the _2 function to the wildcard,
> and then applying the ‘user’ function to that. In a similar way the
> ‘product’ function is applied in the next line, yet these functions don’t
> seem to exist anywhere in the project, nor are they used anywhere else in
> the code. It almost makes sense, but not quite. Code below:
>
>
>     val ratings = sc.textFile(new File(movieLensHomeDir,
> "ratings.dat").toString).map { line =>
>       val fields = line.split("::")
>       // format: (timestamp % 10, Rating(userId, movieId, rating))
>       (fields(3).toLong % 10, Rating(fields(0).toInt, fields(1).toInt,
> fields(2).toDouble))
>     }
> …
>     val numRatings = ratings.count
>     val numUsers = ratings.map(_._2.user).distinct.count
>     val numMovies = ratings.map(_._2.product).distinct.count
>
> Cheers,
> - Steve Nunez
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Cell : 425-233-8271

Reply via email to