Hi Drew, That indeed causes the name to be emitted now. With the change that I suggested and your patch, I'm now getting the names of vectors, as provided by the -idField, being output with the vectors themselves.
Thanks again, Kris 2010/7/2 Drew Farris <[email protected]> > Hi Kris, > > Could you try the code in the patch at: > https://issues.apache.org/jira/secure/attachment/12448536/MAHOUT-402.patch > > This should cause VectorDumper to emit the names found in NamedVectors. > > Thanks, > Drew > > On Thu, Jul 1, 2010 at 10:23 AM, Kris Jack <[email protected]> wrote: > > > Hi Grant, > > > > I applied the patch but still no luck. In debugging, I found that in > > LuceneIterable, line 129: > > > > << > > result = result.normalize(normPower); > > >> > > > > seems to make result, which was before a NamedVector, back into a Vector > > and > > causes the name to be lost. If I change the code to allow the name to be > > kept by replacing the line with: > > > > << > > result = new NamedVector(result.normalize(normPower), name); > > >> > > > > then the name is included and the result remains a NamedVector but the > > VectorDumper code still just prints out Vectors and not NamedVectors. > > Perhaps I am going back this wrong but shouldn't there be a check in the > > VectorDumper to find out the type of vector being dumped? > > > > Thanks, > > Kris > > > > > > > > 2010/6/30 Grant Ingersoll <[email protected]> > > > > > Kris, > > > > > > Can you try the patch at > > > > > > https://issues.apache.org/jira/secure/attachment/12448396/MAHOUT-379-lucene.patch > > > > > > Thanks, > > > Grant > > > > > > On Jun 30, 2010, at 8:53 AM, Grant Ingersoll wrote: > > > > > > > > > > > On Jun 30, 2010, at 8:39 AM, Grant Ingersoll wrote: > > > > > > > >> > > > >> On Jun 29, 2010, at 1:54 PM, Kris Jack wrote: > > > >> > > > >>> Hi everyone, > > > >>> > > > >>> I have been using mahout to generate vectors from a lucene index > > using: > > > >>> > > > >>> $MAHOUT_HOME/bin/mahout lucene.vector > > > >>> > > > >>> In doing so, mahout creates an output file that has new ids for my > > > >>> documents, that are completely unlike my original --idField, that > is > > a > > > >>> string. How can I relate the new ids to my original ids? Is there > > is > > > a > > > >>> method that allows me to output the vectors with the original > > --idField > > > >>> values that appear in the lucene index rather than the new doc ids? > > > >> > > > >> > > > >> Hmm, it seems the --idField stuff has been commented out, likely > with > > > the change of labels. > > > >> > > > > > > > > I've brought the issue up over on dev@, as it is a bug. > > > > > > -------------------------- > > > Grant Ingersoll > > > http://www.lucidimagination.com/ > > > > > > Search the Lucene ecosystem using Solr/Lucene: > > > http://www.lucidimagination.com/search > > > > > > > > > > > > -- > > Dr Kris Jack, > > http://www.mendeley.com/profiles/kris-jack/ > > > -- Dr Kris Jack, http://www.mendeley.com/profiles/kris-jack/
