Hm. you just said you want to supervise it? That's what i suggested first but you said you wanted supervised classficiation. Clustering is unsupervised (at least in methods with Mahout... ).
On Fri, Nov 9, 2012 at 9:58 AM, qiaoresearcher <[email protected]>wrote: > yes, it is. Does mahout has examples or similar example to do this: read > the gzip file, construct page set, form vectors for each user, then run as > rabbit > > On Fri, Nov 9, 2012 at 11:47 AM, Sean Owen <[email protected]> wrote: > > > That's a clustering problem, no? > > > > > > On Fri, Nov 9, 2012 at 4:43 PM, qiaoresearcher <[email protected] > > >wrote: > > > > > It is a supervised classification problem. > > > > > > For example, a very simple case: > > > say, overall we collect 4 pages from the data set: { web_page 1 > > web_page > > > 2 web_page 3 web_page 4 } > > > then users may have input vectors like: > > > user1 [1 1 0 0] > > > user2 [1 1 0 0] > > > user3 [0 0 1 1] > > > user4 [0 0 1 1] > > > user5 [0 0 1 1] > > > ... .... > > > > > > then whatever classification algorithm mahout has should return > > > classification results as > > > group 1 { user1, user2} > > > group 2 { user3, user4, user5 } > > > > > >
