Got it. Thanks so much, Ted. One more question, I am also trying to test the MixedGradient, it looks like the RankingGradient will take much more time than the DefaultGradient.
If I set the alpha to 0.5, it will take 50 times of time comparing to the DefaultGradient, I thought it should be like that for the RankingGradient will do lots of Ranking comparison, and I heard that the algorithm is not sensitive to alpha, how would you suggest a alpha I should choose? I haven't found much material or suggestion about that. Best wishes, Stanley Xu On Fri, Apr 22, 2011 at 6:04 AM, Ted Dunning <[email protected]> wrote: > It is definitely a reasonable idea to convert data to hashed feature > vectors using map-reduce. > > And yes, you can pick a vector length that is long enough so that you don't > have to worry about > collisions. You need to examine your data to decide how large that needs > to be, but it isn't hard > to do. The encoding framework handles to the placement of features in the > vector for you. You > don't have to worry about that. > > > On Wed, Apr 20, 2011 at 8:03 PM, Stanley Xu <[email protected]> wrote: > >> Thanks Ted. Since the SGD is a sequential method, so the Vector be created >> for each line could be very large and won't consume too much memory. Could >> I >> assume if we have limited number of features, or could use the map-reduce >> to >> pre-process the data to know how many different values in a category could >> have, we could just create a long vector, and put different feature values >> to different slot to avoid the possible feature collision? >> >> Thanks, >> Stanley >> >> >> >> On Thu, Apr 21, 2011 at 12:24 AM, Ted Dunning <[email protected]> >> wrote: >> >> > Stanley, >> > >> > Yes. What you say is correct. Feature hashing can cause degradation. >> > >> > With multiple hashing, however, you do have a fairly strong guarantee >> that >> > the feature hashing is very close to information preserving. This is >> > related to the fact that the feature hashing operation is a random >> linear >> > transformation. Since we are hashing to something that is still quite a >> > high dimensional space, the information loss is likely to be minimal. >> > >> > On Wed, Apr 20, 2011 at 6:06 AM, Stanley Xu <[email protected]> >> wrote: >> > >> > > Dear all, >> > > >> > > Per my understand, what Feature Hashing did in SGD do compress the >> > Feature >> > > Dimensions to a fixed length Vector. Won't that make the training >> result >> > > incorrect if Feature Hashing Collision happened? Won't the two >> features >> > > hashed to the same slot would be thought as the same feature? Even if >> we >> > > have multiple probes to reduce the total collision like a bloom >> filter. >> > > Won't it also make the slot that has the collision looks like a >> > combination >> > > feature? >> > > >> > > Thanks. >> > > >> > > Best wishes, >> > > Stanley Xu >> > > >> > >> > >
