Thanks Grant.  Exactly correct.

Some pig or hive action is indicated here.  Or write a map-reduce where the
reducer does the vector generation.



On Thu, Jun 13, 2013 at 7:13 PM, Grant Ingersoll <[email protected]>wrote:

> I think Ted was implying just write a script to aggregate the Movielens
> data by user id.  Should be pretty straightforward.
>
> On Jun 13, 2013, at 10:05 AM, Neetha <[email protected]> wrote:
>
> > Thank you, for the reply. How can we group the user.
> >
> >
> > On Thu, Jun 13, 2013 at 3:41 PM, Ted Dunning <[email protected]>
> wrote:
> >
> >> [image: Boxbe] <https://www.boxbe.com/overview> This message is
> eligible
> >> for Automatic Cleanup! ([email protected]) Add cleanup rule<
> https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DGYex%252FPN%252FsEWDwuSs%252F9AS43g45aYbNc1OMuaZA7xu3TRldhNItvxAspHuwKeaedBKYvZ5Ah5DVIK7%252F%252B0qQSbX3CvYa7lvPle4%252BTdcv5k4cI%252BL4yoMK8by1Rm7UhZnW7TcvFw%252FeqoeYWXhz%252BgDPSUIWA%253D%253D%26key%3D0Lbb2Ob2N7oax0oxeBQTRLmrOCps42qosLO9Gh82kvs%253D&tc_serial=14367563490&tc_rand=1983549237&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>|
> More
> >> info<
> http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=14367563490&tc_rand=1983549237&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001
> >
> >>
> >> You need to group by user before converting to vector to get sensible
> >> clustering.
> >>
> >>
> >> On Wed, Jun 12, 2013 at 1:06 PM, Grant Ingersoll <[email protected]
> >>> wrote:
> >>
> >>> The CSVVectorIterator in the Integration package will take in a CSV
> file
> >>> and produce vectors.  It assumes that each row is the equivalent of a
> >>> DenseVector (does MovieLens fit that?)  If you need otherwise, I'd
> >> suggest
> >>> starting with the code and modifying to fit your needs.
> >>>
> >>>
> >>> -Grant
> >>>
> >>> On Jun 12, 2013, at 6:11 AM, Neetha <[email protected]> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> I am using 1m movielens.
> >>>>
> >>>> I need to run the K-means clustering using mahout and hadoop.
> Actually,
> >>>> 1st step in the clustering is to convert into a sequence file, then
> >> into
> >>>> vector format and then apply the clustering algorithm. My doubt is, Is
> >>>> there any need to convert the movielens rating.csv file into a
> sequence
> >>>> file. If needed what are the commands for applying clustering
> technique
> >>>> using mahout and the hadoop.
> >>>>
> >>>> Thanking you,
> >>>> Neetha Suan Thampi
> >>>
> >>> --------------------------------------------
> >>> Grant Ingersoll | @gsingers
> >>> http://www.lucidworks.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

Reply via email to