BTW,
If you follow mahout Logistics Regression model parameter usage, by
default, noBias param is false, so bias param feature will be on and equal
to 1



On Wed, Jun 27, 2012 at 11:07 AM, sam wu <[email protected]> wrote:

> I don't think the problem is due to the sample size, if you use 2
> features, 10 samples might be OK.
>
> The problem is that you didn't include bias(intercept) term, which is
> always 1.
> If you add bias term(1) to your point class, you'll get 0.9999 for cluster
> 0 probability.
>
>
> Sam
>
>
> On Wed, Jun 27, 2012 at 1:23 AM, Sean Owen <[email protected]> wrote:
>
>> Those are both true; they may not be the issue here.
>>
>> The test point definitely belongs in the first of the two groups you
>> created. Why is the result surprising?
>>
>> On Wed, Jun 27, 2012 at 9:15 AM, Lance Norskog <[email protected]> wrote:
>>
>> > Not enough samples. Machine learning algorithms in general do well if
>> > you have large sample sets (hundreds or thousands) from "real" data
>> > sources. The data should have a strong signal but be a little noisy.
>> >
>> > Also: your Point class needs a hashCode() since it does equals(). The
>> > Map class won't work at scale.
>> >
>> > On Wed, Jun 27, 2012 at 1:00 AM, damodar shetyo <
>> [email protected]>
>> > wrote:
>> > > I am trying to build a simple model that can group points in 2D
>> space.Am
>> > > training the model by giving few examples.After that i am using the
>> model
>> > > to predict the group in which the any other points may fall.But am not
>> > > getting answer as expected.Am i missing something in my code or am i
>> > doing
>> > > something wrong?
>> > >
>> > >       public class SimpleClassifier {
>> > >
>> > >    public static class Point{
>> > >        public int x;
>> > >        public int y;
>> > >
>> > >        public Point(int x,int y){
>> > >            this.x = x;
>> > >            this.y = y;
>> > >        }
>> > >
>> > >        @Override
>> > >        public boolean equals(Object arg0) {
>> > >            Point p = (Point)  arg0;
>> > >            return( (this.x == p.x) &&(this.y== p.y));
>> > >        }
>> > >
>> > >        @Override
>> > >        public String toString() {
>> > >            // TODO Auto-generated method stub
>> > >            return  this.x + " , " + this.y ;
>> > >        }
>> > >    }
>> > >    public static void main(String[] args) {
>> > >
>> > >        Map<Point,Integer> points = new HashMap<SimpleClassifier.Point,
>> > > Integer>();
>> > >
>> > >        points.put(new Point(0,0), 0);
>> > >        points.put(new Point(1,1), 0);
>> > >        points.put(new Point(1,0), 0);
>> > >        points.put(new Point(0,1), 0);
>> > >        points.put(new Point(2,2), 0);
>> > >
>> > >
>> > >        points.put(new Point(8,8), 1);
>> > >        points.put(new Point(8,9), 1);
>> > >        points.put(new Point(9,8), 1);
>> > >        points.put(new Point(9,9), 1);
>> > >
>> > >
>> > >        OnlineLogisticRegression learningAlgo = new
>> > > OnlineLogisticRegression();
>> > >        learningAlgo =  new OnlineLogisticRegression(2, 2, new L1());
>> > >        learningAlgo.learningRate(50);
>> > >
>> > >        //learningAlgo.alpha(1).stepOffset(1000);
>> > >
>> > >        System.out.println("training model  \n" );
>> > >        for(Point point : points.keySet()){
>> > >            Vector v = getVector(point);
>> > >            System.out.println(point  + " belongs to " +
>> > points.get(point));
>> > >            learningAlgo.train(points.get(point), v);
>> > >        }
>> > >
>> > >        learningAlgo.close();
>> > >
>> > >
>> > >        //now classify real data
>> > >        Vector v = new RandomAccessSparseVector(2);
>> > >        v.set(0, 0.5);
>> > >        v.set(1, 0.5);
>> > >
>> > >        Vector r = learningAlgo.classifyFull(v);
>> > >        System.out.println(r);
>> > >
>> > >        System.out.println("ans = " );
>> > >        System.out.println("no of categories = " +
>> > > learningAlgo.numCategories());
>> > >        System.out.println("no of features = " +
>> > > learningAlgo.numFeatures());
>> > >        System.out.println("Probability of cluster 0 = " + r.get(0));
>> > >        System.out.println("Probability of cluster 1 = " + r.get(1));
>> > >
>> > >    }
>> > >
>> > >    public static Vector getVector(Point point){
>> > >        Vector v = new DenseVector(2);
>> > >        v.set(0, point.x);
>> > >        v.set(1, point.y);
>> > >
>> > >        return v;
>> > >    }
>> > > }
>> > >
>> > > OP
>> > > ans =
>> > > no of categories = 2
>> > > no of features = 2
>> > > Probability of cluster 0 = 3.9580985042775296E-4
>> > > Probability of cluster 1 = 0.9996041901495722
>> > >
>> > > 99 % of times the output show more probability for cluster 1.Why?
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Damodar Shetyo
>> >
>> >
>> >
>> > --
>> > Lance Norskog
>> > [email protected]
>> >
>>
>
>

Reply via email to