Hi
I have a custom log data which contains following details.
1) UserName
2) MachineId
3) DateTime
4) Data - which contains text - search term etc
I would like to use this data to know
#) how much time they are spending on browsing etc.
#) User based search pattern
First problem can be addressed using Hive query.
For second problem, I suppose clustering can be applied and for this I have
converted data to vectors. I have used dense vector and applied Canopy
algorithm on it. I got an output which I provided as an input to
ClusterDump utility but the out I got was not in readable form, I figured
out that I need to use named vectors so that Key can be displayed as a
output. Here I am facing issue, how to use NamedVector ?
I am performing following steps to generate vectors..
#) Created custom VectorIterable by inheriting Iterable<Vector>.
#) Created custom VectorItertor by inheriting AbstractIterator<Vector>
#) Model class which will be responsible to pass attribute values
(username or data etc) to custom VectorIterator
#) Custom VectorIterator.computeNext() will read line, create dense
vector having size equal to number of attribute in a row.
Please let me know how to add NamedVector here so that I can get some
readable output from ClusterDump utility.
--
Thanks and Regards
Vishal Danech