if it is supervised classification, your input should contain the groups.
te idea is that you extend knowledge hidden in a smaller perhaps expert
labeled dataset to the rest of the universe.
On Nov 9, 2012 8:43 AM, "qiaoresearcher" <[email protected]> wrote:
> It is a supervised classification problem.
>
> For example, a very simple case:
> say, overall we collect 4 pages from the data set: { web_page 1 web_page
> 2 web_page 3 web_page 4 }
> then users may have input vectors like:
> user1 [1 1 0 0]
> user2 [1 1 0 0]
> user3 [0 0 1 1]
> user4 [0 0 1 1]
> user5 [0 0 1 1]
> ... ....
>
> then whatever classification algorithm mahout has should return
> classification results as
> group 1 { user1, user2}
> group 2 { user3, user4, user5 }
>
>
>
> On Fri, Nov 9, 2012 at 10:29 AM, Sean Owen <[email protected]> wrote:
>
> > First: what question are you trying to answer from this data? You are
> > trying to classify users into what, for what purpose?
> >
> >
> > On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher <[email protected]
> > >wrote:
> >
> > > Hi All,
> > >
> > > Assume the data is stored in a gzip file which includes many text
> files.
> > > Within each text file, each line represents an activity of a user, for
> > > example, a click on a web page.
> > > the text file will look like:
> > >
> > >
> >
> ----------------------------------------------------------------------------------
> > > user 1 time11 visiting_web_page11
> > > user 2 time21 visiting_web_page21
> > > user 1 time12 visiting_web_page12
> > > user 1 time13 visiting_web_page13
> > > user 2 time22 visiting_web_page22
> > > user 3 time31 visiting_web_page31
> > > user 1 time14 visiting_web_page14
> > > ... .... ..........
> > >
> > > I am thinking to first construct a web page set like
> > > { visiting_web_page11, visiting_web_page12, visiting_web_page31,
> .......
> > }
> > >
> > > then for each user, we form a vector [ 1 0 0 1 0 0 ..... ]
> where
> > > '1' means the user visited that page and 0 means he did not
> > > then use mahout to classify the users based on the vectors
> > >
> > > does mahout has example like this? if not, what kind of java code we
> need
> > > to write to process this task?
> > >
> > > thanks for any suggestions in advance !
> > >
> >
>