First: what question are you trying to answer from this data? You are trying to classify users into what, for what purpose?
On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher <[email protected]>wrote: > Hi All, > > Assume the data is stored in a gzip file which includes many text files. > Within each text file, each line represents an activity of a user, for > example, a click on a web page. > the text file will look like: > > ---------------------------------------------------------------------------------- > user 1 time11 visiting_web_page11 > user 2 time21 visiting_web_page21 > user 1 time12 visiting_web_page12 > user 1 time13 visiting_web_page13 > user 2 time22 visiting_web_page22 > user 3 time31 visiting_web_page31 > user 1 time14 visiting_web_page14 > ... .... .......... > > I am thinking to first construct a web page set like > { visiting_web_page11, visiting_web_page12, visiting_web_page31, ....... } > > then for each user, we form a vector [ 1 0 0 1 0 0 ..... ] where > '1' means the user visited that page and 0 means he did not > then use mahout to classify the users based on the vectors > > does mahout has example like this? if not, what kind of java code we need > to write to process this task? > > thanks for any suggestions in advance ! >
