Ted and Jeff,

Thanks very much for the advice I'll make a start on named Vectors.

thanks very much again

John

On Thu, Feb 17, 2011 at 5:49 PM, Ted Dunning <[email protected]> wrote:

> This should be fine.
>
> I recommend not doing too much special with the coordinates except for
> translating to unit vector positions.  This allows standard Euclidean
> metrics to give you the result you want.
>
> You will also need to scale or translate your third variable so so that it
> is in the same scale of reference as the first two.  The question you
> should
> ask yourself is whether a particular amount of change in each coordinate
> represents about the same level of difference in the final result.
>
> On Thu, Feb 17, 2011 at 9:15 AM, Jeff Eastman <[email protected]> wrote:
>
> > Let me first translate your problem a little to make it more tangible.
> > Suppose you have data of the following format [Item Longitude Latitude
> > Altitude]:
> >
> > You can convert this into Mahout NamedVectors, where the name=Item and
> the
> > vector values are [Longitude, Latitude, Altitude]. Look in
> > utils/src/main/java/m/a/o/clustering/conversion for some example jobs for
> > starting points. You will likely need to write your own conversion job to
> > create the NamedVectors the way you want from your input data in its
> > encoding format.
> >
> > Now, you can cluster this data using any of the Mahout algorithms, but
> the
> > clustering will treat all your vector elements equally. I get that you
> > really want to cluster mostly based on Altitude (cluster all the Lon/Lat
> > items which have similar Altitudes). If this is the case then you can use
> > one of our WeightedDistanceMeasures to minimize (or eliminate) the
> effects
> > of Lon/Lat and focus mostly (or entirely) on Altitudes. Or, better, you
> can
> > write your own SphericalDistanceMeasure (to deal with the fact that
> Lon=001
> > is quite close to Lon=359, for example).
> >
> > Hope this helps,
> > Jeff
> >
> > -----Original Message-----
> > From: john abbott [mailto:[email protected]]
> > Sent: Thursday, February 17, 2011 8:49 AM
> > To: [email protected]
> > Subject: Clustering assistance, mean shift
> >
> > Hi,
> >
> > I was wondering whether someone might be able to help me out.  I'd like
> to
> > use Mahout via Elastic map Reduce to cluster some datasets but I'm not
> sure
> > I've got the right use case.  I'm hoping someone might be able to comment
> > and perhaps point me in the direction of some further advice.
> >
> > I have a dataset which is stored in a database and structured as follows:
> >
> > Item  Value X   Value Y  Value Z
> > A       2             4            3
> > A       3             5            6
> > A       6             7            9
> > B       5            8             2
> > B       2            4             7
> > ...
> >
> > I would like to create a series of clusters for each item based on the
> > values of X and Y and Z.  X and Y are geographic co-ordinates i.e. real
> > world places and Z is a value observed in those places.  What I'd like to
> > end up with is (for each Item) a series of clusters saying these Values
> of
> > Z
> > are coincident at this place (represented by Value X and Y).  I've looked
> > through and played with the quickstarts and that's all fine but I'm
> > wondering:
> >
> > 1.  Is this sort of analysis possible?
> > 2.  How I convert my numeric data into the correct format to be processed
> > by
> > a Job
> > 3.  Any pointers to how I might configure my job in a way that can be
> > distributed and create a cluster for each item
> >
> > Thank you to anyone who might be able to help, I'm really excited to get
> > started with Mahout but I'm struggling to understand whether it's
> suitable
> > and how to get started.
> >
> > Thanks very much,
> >
> > John
> >
>



-- 
John Abbott
Co-Founder
www.oobafit.com
m. 44 (0)7919392754
@scmjea

Reply via email to