On Jul 2, 2010, at 5:42 AM, Kris Jack wrote: > Hi Drew, > > That indeed causes the name to be emitted now. With the change that I > suggested and your patch,
https://issues.apache.org/jira/browse/MAHOUT-434 handles the normalization issue. > I'm now getting the names of vectors, as provided > by the -idField, being output with the vectors themselves. > > Thanks again, > Kris > > > > 2010/7/2 Drew Farris <[email protected]> > >> Hi Kris, >> >> Could you try the code in the patch at: >> https://issues.apache.org/jira/secure/attachment/12448536/MAHOUT-402.patch >> >> This should cause VectorDumper to emit the names found in NamedVectors. >> >> Thanks, >> Drew >> >> On Thu, Jul 1, 2010 at 10:23 AM, Kris Jack <[email protected]> wrote: >> >>> Hi Grant, >>> >>> I applied the patch but still no luck. In debugging, I found that in >>> LuceneIterable, line 129: >>> >>> << >>> result = result.normalize(normPower); >>>>> >>> >>> seems to make result, which was before a NamedVector, back into a Vector >>> and >>> causes the name to be lost. If I change the code to allow the name to be >>> kept by replacing the line with: >>> >>> << >>> result = new NamedVector(result.normalize(normPower), name); >>>>> >>> >>> then the name is included and the result remains a NamedVector but the >>> VectorDumper code still just prints out Vectors and not NamedVectors. >>> Perhaps I am going back this wrong but shouldn't there be a check in the >>> VectorDumper to find out the type of vector being dumped? >>> >>> Thanks, >>> Kris >>> >>> >>> >>> 2010/6/30 Grant Ingersoll <[email protected]> >>> >>>> Kris, >>>> >>>> Can you try the patch at >>>> >>> >> https://issues.apache.org/jira/secure/attachment/12448396/MAHOUT-379-lucene.patch >>>> >>>> Thanks, >>>> Grant >>>> >>>> On Jun 30, 2010, at 8:53 AM, Grant Ingersoll wrote: >>>> >>>>> >>>>> On Jun 30, 2010, at 8:39 AM, Grant Ingersoll wrote: >>>>> >>>>>> >>>>>> On Jun 29, 2010, at 1:54 PM, Kris Jack wrote: >>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> I have been using mahout to generate vectors from a lucene index >>> using: >>>>>>> >>>>>>> $MAHOUT_HOME/bin/mahout lucene.vector >>>>>>> >>>>>>> In doing so, mahout creates an output file that has new ids for my >>>>>>> documents, that are completely unlike my original --idField, that >> is >>> a >>>>>>> string. How can I relate the new ids to my original ids? Is there >>> is >>>> a >>>>>>> method that allows me to output the vectors with the original >>> --idField >>>>>>> values that appear in the lucene index rather than the new doc ids? >>>>>> >>>>>> >>>>>> Hmm, it seems the --idField stuff has been commented out, likely >> with >>>> the change of labels. >>>>>> >>>>> >>>>> I've brought the issue up over on dev@, as it is a bug. >>>> >>>> -------------------------- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com/ >>>> >>>> Search the Lucene ecosystem using Solr/Lucene: >>>> http://www.lucidimagination.com/search >>>> >>>> >>> >>> >>> -- >>> Dr Kris Jack, >>> http://www.mendeley.com/profiles/kris-jack/ >>> >> > > > > -- > Dr Kris Jack, > http://www.mendeley.com/profiles/kris-jack/
