sorry you probably meant that anyway. your trained input should be labeled
by groups and your prediction request input is not labeled.

looks like a job for a classification like sgd except visited pages make up
poor categorical source without looking into their content similarities.
On Nov 9, 2012 8:49 AM, "Dmitriy Lyubimov" <[email protected]> wrote:

> if it is supervised classification, your input should contain the groups.
> te idea is that you extend knowledge hidden in  a smaller perhaps expert
> labeled dataset to the rest of the universe.
> On Nov 9, 2012 8:43 AM, "qiaoresearcher" <[email protected]> wrote:
>
>> It is a supervised classification problem.
>>
>> For example, a very simple case:
>> say, overall we collect 4 pages from the data set:  { web_page 1  web_page
>> 2 web_page 3 web_page 4  }
>> then users may have input vectors like:
>> user1 [1 1  0  0]
>> user2 [1 1  0  0]
>> user3 [0 0  1  1]
>> user4 [0 0  1  1]
>> user5 [0 0  1  1]
>>   ...       ....
>>
>> then whatever classification algorithm mahout has should return
>> classification results as
>> group 1 { user1, user2}
>> group 2 { user3, user4, user5 }
>>
>>
>>
>> On Fri, Nov 9, 2012 at 10:29 AM, Sean Owen <[email protected]> wrote:
>>
>> > First: what question are you trying to answer from this data? You are
>> > trying to classify users into what, for what purpose?
>> >
>> >
>> > On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher <
>> [email protected]
>> > >wrote:
>> >
>> > > Hi All,
>> > >
>> > > Assume the data is stored in a gzip file which includes many text
>> files.
>> > > Within each text file, each line represents an activity of a user, for
>> > > example, a click on a web page.
>> > > the text file will look like:
>> > >
>> > >
>> >
>> ----------------------------------------------------------------------------------
>> > > user 1   time11  visiting_web_page11
>> > > user 2   time21  visiting_web_page21
>> > > user 1   time12  visiting_web_page12
>> > > user 1   time13  visiting_web_page13
>> > > user 2   time22  visiting_web_page22
>> > > user 3   time31  visiting_web_page31
>> > > user 1   time14  visiting_web_page14
>> > >  ...           ....                ..........
>> > >
>> > > I am thinking to first construct a web page set like
>> > > { visiting_web_page11, visiting_web_page12, visiting_web_page31,
>> .......
>> > }
>> > >
>> > > then for each user, we form a vector [ 1  0 0  1 0  0  .....    ]
>>  where
>> > > '1' means the user visited that page and 0 means he did not
>> > > then use mahout to classify the users based on the vectors
>> > >
>> > > does mahout has example like this? if not, what kind of java code we
>> need
>> > > to write to process this task?
>> > >
>> > > thanks for any suggestions in advance !
>> > >
>> >
>>
>

Reply via email to