RE: Get statistic result from RDD

2015-10-20 Thread ChengBo
Thanks, but I still don’t get it. I have used groupBy to group data by userID, and for each ID, I need to get the statistic information. Best Frank From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, October 20, 2015 3:12 PM To: ChengBo Cc: user Subject: Re: Get statistic result from RDD

Re: Get statistic result from RDD

2015-10-20 Thread Ted Yu
each ID, I need to > get the statistic information. > > > > Best > > Frank > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* Tuesday, October 20, 2015 3:12 PM > *To:* ChengBo > *Cc:* user > *Subject:* Re: Get statistic result from RDD > > >

RE: Get statistic result from RDD

2015-10-20 Thread ChengBo
I tried, but it shows: “error: value reduceByKey is not a member of iterable[((Int, Int, String, String), String), Int]” Best Frank From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, October 20, 2015 3:46 PM To: ChengBo Cc: user Subject: Re: Get statistic result from RDD Please take

Re: Get statistic result from RDD

2015-10-20 Thread Ted Yu
Your mapValues can emit a tuple. If p(0) is between 0 and 5, first component of tuple would be 1, second being 0. If p(0) is 6 or 7, first component of tuple would be 0, second being 1. You can use reduceByKey to sum up corresponding component. On Tue, Oct 20, 2015 at 1:33 PM, Shepherd

RE: Get statistic result from RDD

2015-10-20 Thread ChengBo
I tried, but it shows: “error: value reduceByKey is not a member of iterable[((Int, Int, String, String), String), Int]” Best Cheng From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, October 20, 2015 3:46 PM To: ChengBo Cc: user Subject: Re: Get statistic result from RDD Please take