First: what question are you trying to answer from this data? You are
trying to classify users into what, for what purpose?


On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher <[email protected]>wrote:

> Hi All,
>
> Assume the data is stored in a gzip file which includes many text files.
> Within each text file, each line represents an activity of a user, for
> example, a click on a web page.
> the text file will look like:
>
> ----------------------------------------------------------------------------------
> user 1   time11  visiting_web_page11
> user 2   time21  visiting_web_page21
> user 1   time12  visiting_web_page12
> user 1   time13  visiting_web_page13
> user 2   time22  visiting_web_page22
> user 3   time31  visiting_web_page31
> user 1   time14  visiting_web_page14
>  ...           ....                ..........
>
> I am thinking to first construct a web page set like
> { visiting_web_page11, visiting_web_page12, visiting_web_page31, ....... }
>
> then for each user, we form a vector [ 1  0 0  1 0  0  .....    ]  where
> '1' means the user visited that page and 0 means he did not
> then use mahout to classify the users based on the vectors
>
> does mahout has example like this? if not, what kind of java code we need
> to write to process this task?
>
> thanks for any suggestions in advance !
>

Reply via email to