On Jul 2, 2010, at 5:42 AM, Kris Jack wrote:

> Hi Drew,
> 
> That indeed causes the name to be emitted now.  With the change that I
> suggested and your patch,

https://issues.apache.org/jira/browse/MAHOUT-434 handles the normalization 
issue.

> I'm now getting the names of vectors, as provided
> by the -idField, being output with the vectors themselves.
> 




> Thanks again,
> Kris
> 
> 
> 
> 2010/7/2 Drew Farris <[email protected]>
> 
>> Hi Kris,
>> 
>> Could you try the code in the patch at:
>> https://issues.apache.org/jira/secure/attachment/12448536/MAHOUT-402.patch
>> 
>> This should cause VectorDumper to emit the names found in NamedVectors.
>> 
>> Thanks,
>> Drew
>> 
>> On Thu, Jul 1, 2010 at 10:23 AM, Kris Jack <[email protected]> wrote:
>> 
>>> Hi Grant,
>>> 
>>> I applied the patch but still no luck.  In debugging, I found that in
>>> LuceneIterable, line 129:
>>> 
>>> <<
>>> result = result.normalize(normPower);
>>>>> 
>>> 
>>> seems to make result, which was before a NamedVector, back into a Vector
>>> and
>>> causes the name to be lost.  If I change the code to allow the name to be
>>> kept by replacing the line with:
>>> 
>>> <<
>>> result = new NamedVector(result.normalize(normPower), name);
>>>>> 
>>> 
>>> then the name is included and the result remains a NamedVector but the
>>> VectorDumper code still just prints out Vectors and not NamedVectors.
>>> Perhaps I am going back this wrong but shouldn't there be a check in the
>>> VectorDumper to find out the type of vector being dumped?
>>> 
>>> Thanks,
>>> Kris
>>> 
>>> 
>>> 
>>> 2010/6/30 Grant Ingersoll <[email protected]>
>>> 
>>>> Kris,
>>>> 
>>>> Can you try the patch at
>>>> 
>>> 
>> https://issues.apache.org/jira/secure/attachment/12448396/MAHOUT-379-lucene.patch
>>>> 
>>>> Thanks,
>>>> Grant
>>>> 
>>>> On Jun 30, 2010, at 8:53 AM, Grant Ingersoll wrote:
>>>> 
>>>>> 
>>>>> On Jun 30, 2010, at 8:39 AM, Grant Ingersoll wrote:
>>>>> 
>>>>>> 
>>>>>> On Jun 29, 2010, at 1:54 PM, Kris Jack wrote:
>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> I have been using mahout to generate vectors from a lucene index
>>> using:
>>>>>>> 
>>>>>>> $MAHOUT_HOME/bin/mahout lucene.vector
>>>>>>> 
>>>>>>> In doing so, mahout creates an output file that has new ids for my
>>>>>>> documents, that are completely unlike my original --idField, that
>> is
>>> a
>>>>>>> string.  How can I relate the new ids to my original ids?  Is there
>>> is
>>>> a
>>>>>>> method that allows me to output the vectors with the original
>>> --idField
>>>>>>> values that appear in the lucene index rather than the new doc ids?
>>>>>> 
>>>>>> 
>>>>>> Hmm, it seems the --idField stuff has been commented out, likely
>> with
>>>> the change of labels.
>>>>>> 
>>>>> 
>>>>> I've brought the issue up over on dev@, as it is a bug.
>>>> 
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Dr Kris Jack,
>>> http://www.mendeley.com/profiles/kris-jack/
>>> 
>> 
> 
> 
> 
> -- 
> Dr Kris Jack,
> http://www.mendeley.com/profiles/kris-jack/

Reply via email to