BTW, If you follow mahout Logistics Regression model parameter usage, by default, noBias param is false, so bias param feature will be on and equal to 1
On Wed, Jun 27, 2012 at 11:07 AM, sam wu <[email protected]> wrote: > I don't think the problem is due to the sample size, if you use 2 > features, 10 samples might be OK. > > The problem is that you didn't include bias(intercept) term, which is > always 1. > If you add bias term(1) to your point class, you'll get 0.9999 for cluster > 0 probability. > > > Sam > > > On Wed, Jun 27, 2012 at 1:23 AM, Sean Owen <[email protected]> wrote: > >> Those are both true; they may not be the issue here. >> >> The test point definitely belongs in the first of the two groups you >> created. Why is the result surprising? >> >> On Wed, Jun 27, 2012 at 9:15 AM, Lance Norskog <[email protected]> wrote: >> >> > Not enough samples. Machine learning algorithms in general do well if >> > you have large sample sets (hundreds or thousands) from "real" data >> > sources. The data should have a strong signal but be a little noisy. >> > >> > Also: your Point class needs a hashCode() since it does equals(). The >> > Map class won't work at scale. >> > >> > On Wed, Jun 27, 2012 at 1:00 AM, damodar shetyo < >> [email protected]> >> > wrote: >> > > I am trying to build a simple model that can group points in 2D >> space.Am >> > > training the model by giving few examples.After that i am using the >> model >> > > to predict the group in which the any other points may fall.But am not >> > > getting answer as expected.Am i missing something in my code or am i >> > doing >> > > something wrong? >> > > >> > > public class SimpleClassifier { >> > > >> > > public static class Point{ >> > > public int x; >> > > public int y; >> > > >> > > public Point(int x,int y){ >> > > this.x = x; >> > > this.y = y; >> > > } >> > > >> > > @Override >> > > public boolean equals(Object arg0) { >> > > Point p = (Point) arg0; >> > > return( (this.x == p.x) &&(this.y== p.y)); >> > > } >> > > >> > > @Override >> > > public String toString() { >> > > // TODO Auto-generated method stub >> > > return this.x + " , " + this.y ; >> > > } >> > > } >> > > public static void main(String[] args) { >> > > >> > > Map<Point,Integer> points = new HashMap<SimpleClassifier.Point, >> > > Integer>(); >> > > >> > > points.put(new Point(0,0), 0); >> > > points.put(new Point(1,1), 0); >> > > points.put(new Point(1,0), 0); >> > > points.put(new Point(0,1), 0); >> > > points.put(new Point(2,2), 0); >> > > >> > > >> > > points.put(new Point(8,8), 1); >> > > points.put(new Point(8,9), 1); >> > > points.put(new Point(9,8), 1); >> > > points.put(new Point(9,9), 1); >> > > >> > > >> > > OnlineLogisticRegression learningAlgo = new >> > > OnlineLogisticRegression(); >> > > learningAlgo = new OnlineLogisticRegression(2, 2, new L1()); >> > > learningAlgo.learningRate(50); >> > > >> > > //learningAlgo.alpha(1).stepOffset(1000); >> > > >> > > System.out.println("training model \n" ); >> > > for(Point point : points.keySet()){ >> > > Vector v = getVector(point); >> > > System.out.println(point + " belongs to " + >> > points.get(point)); >> > > learningAlgo.train(points.get(point), v); >> > > } >> > > >> > > learningAlgo.close(); >> > > >> > > >> > > //now classify real data >> > > Vector v = new RandomAccessSparseVector(2); >> > > v.set(0, 0.5); >> > > v.set(1, 0.5); >> > > >> > > Vector r = learningAlgo.classifyFull(v); >> > > System.out.println(r); >> > > >> > > System.out.println("ans = " ); >> > > System.out.println("no of categories = " + >> > > learningAlgo.numCategories()); >> > > System.out.println("no of features = " + >> > > learningAlgo.numFeatures()); >> > > System.out.println("Probability of cluster 0 = " + r.get(0)); >> > > System.out.println("Probability of cluster 1 = " + r.get(1)); >> > > >> > > } >> > > >> > > public static Vector getVector(Point point){ >> > > Vector v = new DenseVector(2); >> > > v.set(0, point.x); >> > > v.set(1, point.y); >> > > >> > > return v; >> > > } >> > > } >> > > >> > > OP >> > > ans = >> > > no of categories = 2 >> > > no of features = 2 >> > > Probability of cluster 0 = 3.9580985042775296E-4 >> > > Probability of cluster 1 = 0.9996041901495722 >> > > >> > > 99 % of times the output show more probability for cluster 1.Why? >> > > >> > > >> > > >> > > -- >> > > Regards, >> > > Damodar Shetyo >> > >> > >> > >> > -- >> > Lance Norskog >> > [email protected] >> > >> > >
