Hi Darius Thanks for your reply.
I have created my program based on sample tool provided by Mahout to create vectors from wekas ARFF format. I am able to compile the code and also able to generate vectors. I have also used those vectors file to apply Canopy algorithm. The problem I am facing is how to interpret result of Canopy. "Creating vectors from wekas ARFF format" http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-utils/0.2/org/apache/mahout/utils/vectors/arff/Driver.java Attribute-Relation File Format (ARFF)<http://www.cs.waikato.ac.nz/~ml/weka/arff.html> Please let me know if you required more information. Thanks Vishal On Fri, Sep 6, 2013 at 1:55 PM, Darius Miliauskas < [email protected]> wrote: > Dear Vishal, > > can you give some code how you performed your mentioned steps: > > #) Created custom VectorIterable by inheriting Iterable<Vector>. > #) Created custom VectorItertor by inheriting AbstractIterator<Vector> > #) Model class which will be responsible to pass attribute values > (username or data etc) to custom VectorIterator > #) Custom VectorIterator.computeNext() will read line, create dense > vector having size equal to number of attribute in a row. > > Can you compile the code? > > > Best, > > Darius > > > > 2013/9/6 Vishal Danech <[email protected]> > > > Hi > > > > I have a custom log data which contains following details. > > > > 1) UserName > > 2) MachineId > > 3) DateTime > > 4) Data - which contains text - search term etc > > > > I would like to use this data to know > > #) how much time they are spending on browsing etc. > > #) User based search pattern > > > > First problem can be addressed using Hive query. > > > > For second problem, I suppose clustering can be applied and for this I > have > > converted data to vectors. I have used dense vector and applied Canopy > > algorithm on it. I got an output which I provided as an input to > > ClusterDump utility but the out I got was not in readable form, I figured > > out that I need to use named vectors so that Key can be displayed as a > > output. Here I am facing issue, how to use NamedVector ? > > > > I am performing following steps to generate vectors.. > > #) Created custom VectorIterable by inheriting Iterable<Vector>. > > #) Created custom VectorItertor by inheriting > AbstractIterator<Vector> > > #) Model class which will be responsible to pass attribute values > > (username or data etc) to custom VectorIterator > > #) Custom VectorIterator.computeNext() will read line, create dense > > vector having size equal to number of attribute in a row. > > > > Please let me know how to add NamedVector here so that I can get some > > readable output from ClusterDump utility. > > > > -- > > Thanks and Regards > > Vishal Danech > > > -- Thanks and Regards Vishal Danech
