Dear Vishal, can you give some code how you performed your mentioned steps:
#) Created custom VectorIterable by inheriting Iterable<Vector>. #) Created custom VectorItertor by inheriting AbstractIterator<Vector> #) Model class which will be responsible to pass attribute values (username or data etc) to custom VectorIterator #) Custom VectorIterator.computeNext() will read line, create dense vector having size equal to number of attribute in a row. Can you compile the code? Best, Darius 2013/9/6 Vishal Danech <[email protected]> > Hi > > I have a custom log data which contains following details. > > 1) UserName > 2) MachineId > 3) DateTime > 4) Data - which contains text - search term etc > > I would like to use this data to know > #) how much time they are spending on browsing etc. > #) User based search pattern > > First problem can be addressed using Hive query. > > For second problem, I suppose clustering can be applied and for this I have > converted data to vectors. I have used dense vector and applied Canopy > algorithm on it. I got an output which I provided as an input to > ClusterDump utility but the out I got was not in readable form, I figured > out that I need to use named vectors so that Key can be displayed as a > output. Here I am facing issue, how to use NamedVector ? > > I am performing following steps to generate vectors.. > #) Created custom VectorIterable by inheriting Iterable<Vector>. > #) Created custom VectorItertor by inheriting AbstractIterator<Vector> > #) Model class which will be responsible to pass attribute values > (username or data etc) to custom VectorIterator > #) Custom VectorIterator.computeNext() will read line, create dense > vector having size equal to number of attribute in a row. > > Please let me know how to add NamedVector here so that I can get some > readable output from ClusterDump utility. > > -- > Thanks and Regards > Vishal Danech >
