Hi Ted,

 Thanks very much for your very detailed reply. It is very helpful. 
 still some questions. I hope i am not polluting this email list much..
 
I understand all your comments except below:
> Finally, you should be combining group ranking objective as well as
> regression objectives.  Otherwise, your model will simply be learning which
> users are likely to click on anything and those users who will never click
> on anything.  There are provisions for segmented AUC in the code, but that
> will only work for binary targets.  In general, it is common to build
> cascaded models to deal with this.  The first model learns to predict click
> and the cascaded model learns conversion conditional on click.

We can use binary targets; that shouldn't be a problem. 
Could you say a little more about "segmented AUC"? also about the cascaded 
models?
Do you have an reference papers/book/codesSamples/example projects for 
recommendation? 
I have the mahout in action book, but seems i didn't see stuff like that...
Thanks again for your help..


-Weihua


On Jul 11, 2011, at 3:30 PM, Ted Dunning wrote:

> There are lots of problems with the problem as posed.  I am not surprised
> with poor results.
> 
> You should not downsample negative examples so severely.  I would keep as
> many as 10-30 x as many positive examples you have.  Even then, I suspect
> you don't have enough data especially if you have already included data for
> all of your models.
> 
> Your Feature A is not useful unless you are putting all ad results together.
>  Even then, you need to include more advertiser, campaign and ad specific
> features.
> 
> The feature vector size of 10,000 is actually relatively small if you have
> any reasonable degree of sparsity in your user and ad features.  Unused
> features do not hurt learning.
> 
> Finally, you should be combining group ranking objective as well as
> regression objectives.  Otherwise, your model will simply be learning which
> users are likely to click on anything and those users who will never click
> on anything.  There are provisions for segmented AUC in the code, but that
> will only work for binary targets.  In general, it is common to build
> cascaded models to deal with this.  The first model learns to predict click
> and the cascaded model learns conversion conditional on click.
> 
> Most importantly, really, I would recommend that you experiment with model
> design using a system like R so that you can get fast turn-around on
> modeling efforts.
> 
> On Mon, Jul 11, 2011 at 3:04 PM, Weihua Zhu <[email protected]> wrote:
> 
>> hi Thanks Ted.
>> I understand that the training dataset size is small. The reason is that we
>> have very limited number of "action" class events/instances.  We also want
>> to make each target class have equal number of events/instances.
>> Feature A is the advertisement campaign ID, and Feature B is the behaviors
>> that internet user has, for example, gender:male, country: us, etc.
>> I set the size of the encoder to 10000, which is very large.
>> I used this setup for  OnlineLogisticRegressioN:
>>       olr = new OnlineLogisticRegression(3, FEATURES, new L1());
>>       olr.alpha(1).stepOffset(1000).lambda(3e-5).learningRate(3);
>> 
>> Thanks.
>> 
>> -wz
>> 
>> 
>> On Jul 11, 2011, at 2:49 PM, Ted Dunning wrote:
>> 
>>> This is a tiny amount of data.  The regularization in Mahout's SGD
>>> implementation is probably not as effective as second order techniques
>> for
>>> such tiny data.
>>> 
>>> Btw... you didn't answer my questions about what kind of data feature A
>> and
>>> B are.  I understand that you might be shy about this, but without that
>> kind
>>> of information, I can't help you.
>>> 
>>> (and add this additional question)
>>> 
>>> What is the size of the encoded vector?
>>> 
>>> On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <[email protected]> wrote:
>>> 
>>>> Target class is if a user click an ad(advertisement), buy through an ad,
>> or
>>>> not; so 3 classes.
>>>> Feature A s about the Advertisement itself;
>>>> Feature B is about the user's behaviors;
>>>> Currently im only using feature A and B.
>>>> Total training data is 250 for each class;
>>>> 
>>>> thanks..
>>>> 
>>>> 
>>>> ________________________________________
>>>> From: Ted Dunning [[email protected]]
>>>> Sent: Monday, July 11, 2011 2:15 PM
>>>> To: [email protected]
>>>> Subject: Re: combination of features worsen the performance
>>>> 
>>>> Can you say a little bit about the data?
>>>> 
>>>> What are features A and B?  What kind of data do they represent?
>>>> 
>>>> How many other features are there?
>>>> 
>>>> What is the target variable?  How many possible values does it have?
>>>> 
>>>> How much training data do you have?
>>>> 
>>>> What sort of training are you doing?
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <[email protected]> wrote:
>>>> 
>>>>> Hi, Dear all,
>>>>> 
>>>>> I am using mahout logistic regression for classification;
>> interestingly,
>>>>> for feature A, B, individually each has satisfactory performances, say
>>>> 65%,
>>>>> 80%, but when i combine them together(using encoder), the performance
>> is
>>>>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a
>>>> lot,
>>>>> 
>>>>> 
>>>>> -wz.
>>>>> 
>>>> 
>> 
>> 

Reply via email to