Hi Steve, The _ notation can be a bit confusing when starting with Scala, we can rewrite it to avoid using it here. So instead of val numUsers = ratings.map(_._2.user) we can write val numUsers = ratings.map(x => x._2.user)
ratings is an Key-Value RDD (which is an RDD comprised of tuples) and so when map over its we can access the key by calling _1 or the value with _2 and then we get call user on that to get back the user. Does that make sense? Cheers, Holden On Mon, Aug 4, 2014 at 3:17 PM, Steve Nunez <snu...@hortonworks.com> wrote: > Can one of the Scala experts please explain this bit of pattern magic from > the Spark ML tutorial: _._2.*user* ? > > As near as I can tell, this is applying the _2 function to the wildcard, > and then applying the ‘user’ function to that. In a similar way the > ‘product’ function is applied in the next line, yet these functions don’t > seem to exist anywhere in the project, nor are they used anywhere else in > the code. It almost makes sense, but not quite. Code below: > > > val ratings = sc.textFile(new File(movieLensHomeDir, > "ratings.dat").toString).map { line => > val fields = line.split("::") > // format: (timestamp % 10, Rating(userId, movieId, rating)) > (fields(3).toLong % 10, Rating(fields(0).toInt, fields(1).toInt, > fields(2).toDouble)) > } > … > val numRatings = ratings.count > val numUsers = ratings.map(_._2.user).distinct.count > val numMovies = ratings.map(_._2.product).distinct.count > > Cheers, > - Steve Nunez > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- Cell : 425-233-8271