Hi

I have a custom log data which contains following details.

1) UserName
2) MachineId
3) DateTime
4) Data - which contains text - search term etc

I would like to use this data to know
     #) how much time they are spending on browsing etc.
     #) User based search pattern

First problem can be addressed using Hive query.

For second problem, I suppose clustering can be applied and for this I have
converted data to vectors. I have used dense vector and applied Canopy
algorithm on it. I got an output which I provided as an input to
ClusterDump utility but the out I got was not in readable form, I figured
out that I need to use named vectors so that Key can be displayed as a
output. Here I am facing issue, how to use NamedVector ?

I am performing following steps to generate vectors..
     #) Created custom VectorIterable by inheriting Iterable<Vector>.
     #) Created custom VectorItertor by inheriting AbstractIterator<Vector>
     #) Model class which will be responsible to pass attribute values
(username or data etc) to custom VectorIterator
     #) Custom VectorIterator.computeNext() will read line, create dense
vector having size equal to number of attribute in a row.

Please let me know how to add NamedVector here so that I can get some
readable output from ClusterDump utility.

-- 
Thanks and Regards
Vishal Danech

Reply via email to