I have used the RowId Job several times before (and had even advised people on 
its use in these forums), so its kinda embarrassing that I ask this.


I have 'Named Vectors' generated via seq2sparse and am trying to create a 
matrix from these vectors. RowId Job presently does not generate a matrix when 
the input is of type 'NamedVector'.  The code executes successfully to 
completion with no output and I would have expected it to throw an Exception of 
some kind.  


Looking at the code, its obvious that 'Named Vectors' are not being accounted 
for while generating a matrix from the input. 


Would it make sense to submit a patch to get RowId Job working with 'Named 
Vectors' ?

Here are the issues I see with the RowId Job:-

1. If the input has no vectors, RowId Job presently executes successfully and 
generates a 0 x 0 matrix. (I would have expected an exception of some sort with 
an appropriate message like 'Input vectors not found'). 

2. RowId Job presently cannot handle Named Vectors.

3. There was an issue that was brought up on these forums sometime last week to 
which Jake's response was that it would be a welcome patch to be able to 
configure the RowId Job to run in a distributed mode with many mappers. I agree 
that this is a useful change. This change means that the RowSimilarity Job 
which takes as inputs, the matrix and the number of columns generated by the 
RowId job needs to be tested to ensure that nothing's broken.

Should I go ahead and open a Jira for the above issues?


Regards,
Suneel

Reply via email to