my guess is he probably means cluster users based on behaviour into virtual behavioural groups On Nov 9, 2012 8:29 AM, "Sean Owen" <[email protected]> wrote:
> First: what question are you trying to answer from this data? You are > trying to classify users into what, for what purpose? > > > On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher <[email protected] > >wrote: > > > Hi All, > > > > Assume the data is stored in a gzip file which includes many text files. > > Within each text file, each line represents an activity of a user, for > > example, a click on a web page. > > the text file will look like: > > > > > ---------------------------------------------------------------------------------- > > user 1 time11 visiting_web_page11 > > user 2 time21 visiting_web_page21 > > user 1 time12 visiting_web_page12 > > user 1 time13 visiting_web_page13 > > user 2 time22 visiting_web_page22 > > user 3 time31 visiting_web_page31 > > user 1 time14 visiting_web_page14 > > ... .... .......... > > > > I am thinking to first construct a web page set like > > { visiting_web_page11, visiting_web_page12, visiting_web_page31, ....... > } > > > > then for each user, we form a vector [ 1 0 0 1 0 0 ..... ] where > > '1' means the user visited that page and 0 means he did not > > then use mahout to classify the users based on the vectors > > > > does mahout has example like this? if not, what kind of java code we need > > to write to process this task? > > > > thanks for any suggestions in advance ! > > >
