Vikram Nagaraja Rao <nvikram44 <at> hotmail.com> writes: > > I want to set a "weightsFile" parameter so that my distance measure algorithm can find out the distance > based on the weightsFile. But I was not able to setup this parameter. I tried setting it up in core-site.xml > of hadoop conf and also configuring like -DweightsFile=xxxxxx. But my output was no different from the > previous attempts where weightsFile was not used. I even tried setting it up MAHOUT_OPTS. But couldn't > get it working. What Am I missing? > Thanks, > >
Its been long and I got a workaround for this. Earlier, I was working through Command Line utility. For better control, I changed my code to Java. I am currently setting up the weights as follows: 1. Let us assume you have all the terms which you feel should have more weight- age in a file. You read the contents of this in Java and store in a string array. 2. Currently only 3 dm algorithms support weightages. WeightedEuclidean, WeightedManhattan and Tanimoto. If you want to use any other dm algo, you need to extend and implement on the same lines. 3. Each of the above 3 dm classes extend WeightedDistanceMeasure which has a "setWeights" method. You need to set your weights into this as vectors. 4. To convert your weights into vector, you need to know the vector index of the weight. For this, I am reading the dictionary file using SequenceFile.reader and getting the index of the weightterm. 5. Using a RandomAccessSparseVector, I am then creating a vector and setting it to setWeights of my dm. This is working well for me. But not sure whether there are any better methods of doing this.
