Hi Darius

Thanks for your reply.

I have created my program based on sample tool provided by Mahout to create
vectors from wekas ARFF format. I am able to compile the code and also able
to generate vectors. I have also used those vectors file to apply Canopy
algorithm. The problem I am facing is how to interpret result of Canopy.

"Creating vectors from wekas ARFF format"
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-utils/0.2/org/apache/mahout/utils/vectors/arff/Driver.java


Attribute-Relation File Format
(ARFF)<http://www.cs.waikato.ac.nz/~ml/weka/arff.html>

Please let me know if you required more information.

Thanks

Vishal



On Fri, Sep 6, 2013 at 1:55 PM, Darius Miliauskas <
[email protected]> wrote:

> Dear Vishal,
>
> can you give some code how you performed your mentioned steps:
>
>  #) Created custom VectorIterable by inheriting Iterable<Vector>.
>  #) Created custom VectorItertor by inheriting AbstractIterator<Vector>
>  #) Model class which will be responsible to pass attribute values
> (username or data etc) to custom VectorIterator
>  #) Custom VectorIterator.computeNext() will read line, create dense
> vector having size equal to number of attribute in a row.
>
> Can you compile the code?
>
>
> Best,
>
> Darius
>
>
>
> 2013/9/6 Vishal Danech <[email protected]>
>
> > Hi
> >
> > I have a custom log data which contains following details.
> >
> > 1) UserName
> > 2) MachineId
> > 3) DateTime
> > 4) Data - which contains text - search term etc
> >
> > I would like to use this data to know
> >      #) how much time they are spending on browsing etc.
> >      #) User based search pattern
> >
> > First problem can be addressed using Hive query.
> >
> > For second problem, I suppose clustering can be applied and for this I
> have
> > converted data to vectors. I have used dense vector and applied Canopy
> > algorithm on it. I got an output which I provided as an input to
> > ClusterDump utility but the out I got was not in readable form, I figured
> > out that I need to use named vectors so that Key can be displayed as a
> > output. Here I am facing issue, how to use NamedVector ?
> >
> > I am performing following steps to generate vectors..
> >      #) Created custom VectorIterable by inheriting Iterable<Vector>.
> >      #) Created custom VectorItertor by inheriting
> AbstractIterator<Vector>
> >      #) Model class which will be responsible to pass attribute values
> > (username or data etc) to custom VectorIterator
> >      #) Custom VectorIterator.computeNext() will read line, create dense
> > vector having size equal to number of attribute in a row.
> >
> > Please let me know how to add NamedVector here so that I can get some
> > readable output from ClusterDump utility.
> >
> > --
> > Thanks and Regards
> > Vishal Danech
> >
>



-- 
Thanks and Regards
Vishal Danech

Reply via email to