Hi Ted, Thanks very much for your very detailed reply. It is very helpful. still some questions. I hope i am not polluting this email list much.. I understand all your comments except below: > Finally, you should be combining group ranking objective as well as > regression objectives. Otherwise, your model will simply be learning which > users are likely to click on anything and those users who will never click > on anything. There are provisions for segmented AUC in the code, but that > will only work for binary targets. In general, it is common to build > cascaded models to deal with this. The first model learns to predict click > and the cascaded model learns conversion conditional on click.
We can use binary targets; that shouldn't be a problem. Could you say a little more about "segmented AUC"? also about the cascaded models? Do you have an reference papers/book/codesSamples/example projects for recommendation? I have the mahout in action book, but seems i didn't see stuff like that... Thanks again for your help.. -Weihua On Jul 11, 2011, at 3:30 PM, Ted Dunning wrote: > There are lots of problems with the problem as posed. I am not surprised > with poor results. > > You should not downsample negative examples so severely. I would keep as > many as 10-30 x as many positive examples you have. Even then, I suspect > you don't have enough data especially if you have already included data for > all of your models. > > Your Feature A is not useful unless you are putting all ad results together. > Even then, you need to include more advertiser, campaign and ad specific > features. > > The feature vector size of 10,000 is actually relatively small if you have > any reasonable degree of sparsity in your user and ad features. Unused > features do not hurt learning. > > Finally, you should be combining group ranking objective as well as > regression objectives. Otherwise, your model will simply be learning which > users are likely to click on anything and those users who will never click > on anything. There are provisions for segmented AUC in the code, but that > will only work for binary targets. In general, it is common to build > cascaded models to deal with this. The first model learns to predict click > and the cascaded model learns conversion conditional on click. > > Most importantly, really, I would recommend that you experiment with model > design using a system like R so that you can get fast turn-around on > modeling efforts. > > On Mon, Jul 11, 2011 at 3:04 PM, Weihua Zhu <[email protected]> wrote: > >> hi Thanks Ted. >> I understand that the training dataset size is small. The reason is that we >> have very limited number of "action" class events/instances. We also want >> to make each target class have equal number of events/instances. >> Feature A is the advertisement campaign ID, and Feature B is the behaviors >> that internet user has, for example, gender:male, country: us, etc. >> I set the size of the encoder to 10000, which is very large. >> I used this setup for OnlineLogisticRegressioN: >> olr = new OnlineLogisticRegression(3, FEATURES, new L1()); >> olr.alpha(1).stepOffset(1000).lambda(3e-5).learningRate(3); >> >> Thanks. >> >> -wz >> >> >> On Jul 11, 2011, at 2:49 PM, Ted Dunning wrote: >> >>> This is a tiny amount of data. The regularization in Mahout's SGD >>> implementation is probably not as effective as second order techniques >> for >>> such tiny data. >>> >>> Btw... you didn't answer my questions about what kind of data feature A >> and >>> B are. I understand that you might be shy about this, but without that >> kind >>> of information, I can't help you. >>> >>> (and add this additional question) >>> >>> What is the size of the encoded vector? >>> >>> On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <[email protected]> wrote: >>> >>>> Target class is if a user click an ad(advertisement), buy through an ad, >> or >>>> not; so 3 classes. >>>> Feature A s about the Advertisement itself; >>>> Feature B is about the user's behaviors; >>>> Currently im only using feature A and B. >>>> Total training data is 250 for each class; >>>> >>>> thanks.. >>>> >>>> >>>> ________________________________________ >>>> From: Ted Dunning [[email protected]] >>>> Sent: Monday, July 11, 2011 2:15 PM >>>> To: [email protected] >>>> Subject: Re: combination of features worsen the performance >>>> >>>> Can you say a little bit about the data? >>>> >>>> What are features A and B? What kind of data do they represent? >>>> >>>> How many other features are there? >>>> >>>> What is the target variable? How many possible values does it have? >>>> >>>> How much training data do you have? >>>> >>>> What sort of training are you doing? >>>> >>>> >>>> >>>> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <[email protected]> wrote: >>>> >>>>> Hi, Dear all, >>>>> >>>>> I am using mahout logistic regression for classification; >> interestingly, >>>>> for feature A, B, individually each has satisfactory performances, say >>>> 65%, >>>>> 80%, but when i combine them together(using encoder), the performance >> is >>>>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a >>>> lot, >>>>> >>>>> >>>>> -wz. >>>>> >>>> >> >>
