In writing the similarity matrices to Solr there is a bit of a problem. The 
Matrices exist in two DRMs. The rows correspond to the doc IDs. As far as I 
know there is no guarantee that the ids of both matrices are in the same 
descending order. 

The easiest solution is to have an index for [B'B] and one for [B'A]. That 
means two or perhaps three queries for cross-recommendations, which is not 
ideal.

First I'm going to create two collections of docs with different field 
ids--this should work and we can merge them later.

Next we can do some m/r to group the docs by id so there is one collection 
(csv) with one line per doc. 

Alternatively it is a possible that the DRMs can be iterated simultaneously, 
which would also solve the problem. It assumes the order in both DRMs is the 
same, descending by Key = item ID. Even if a row is missing in one or the other 
this would work.

Does anyone know if the DRMs are guaranteed to have row ordering by Key? RSJ 
creates [B'B] and matrix multiply creates [B'A]


On Aug 2, 2013, at 11:14 PM, Ted Dunning <[email protected]> wrote:

Yes.  We need two different sets of documents if the row space of the
cross/co-occurrence matrices are different as is the case with A'B and B'B.

This could mean two indexes.

Or a single index with a special field to indicate what type of record you
have.


On Fri, Aug 2, 2013 at 2:39 PM, Pat Ferrel <[email protected]> wrote:

> Thanks, well put.
> 
> In order to have the ultimate impl with two id spaces for A and B would we
> have to create different docs for A'B and B'B? Since the docs IDs must come
> from A or B? The fields can contain different sets of IDs but the Doc ID
> must be one or the other, right? Doesn't this imply separate indexes for
> the separate A, B item IDs spaces? This is not a question for this first
> cut impl but is a generalization question.
> 
> On Aug 2, 2013, at 2:06 PM, Ted Dunning <[email protected]> wrote:
> 
> So there is a lot of good discussion here and there were some key ideas.
> 
> The first idea is that the *input* to a recommender is on the right in the
> matrix notation.  This refers inherently to the id's on the columns of the
> recommender product (either B'B or B'A).  The columns are defined by the
> right hand element of the product (either B or A in the B'B and B'A
> respectively).
> 
> The results are in the row space and are defined by the left hand operand
> of the product.  IN the case of B'A and B'B, the left hand operand is B in
> both cases so the row space is consistent.
> 
> In order to implement this in a search engine, we need documents that
> correspond to rows of B'A or B'B.  These are the same as the columns of B.
> The fields of the documents will necessarily include the following:
> 
> id: the column id from B corresponding to this item
> description: presentation info ... yada yada
> b-a-links: contents of this row of B'A expressed as id's from the column
> space of A where this row                  of llr-filter(B'A) contains a
> non-zero value.
> b-b-links: contents of this row of B'B expressed as id's from the column
> space of B ...
> 
> 
> The following operations are now single queries:
> 
> get an item where id = x
>      query is [id:x]
> 
> recommend based on behavior with regard to A items and actions h_a
>      query is [b-a-links: h_a]
> 
> recommend based on behavior with regard to B items and actions h_b
>      query is [b-b-links: h_b]
> 
> recommend based on a single item with id = x
>       query is [b-b-links: x]
> 
> recommend based on composite behavior composed of h_a and h_b
>       query is [b-a-links: h_a b-b-links: h_b]
> 
> Does this make sense by being more explicit?
> 
> Now, it is pretty clear that we could have an index of A objects as well
> but the link fields would have to be a-a-links and a-b-links, of course.
> 
> 
> 
> 
> On Fri, Aug 2, 2013 at 1:25 PM, Pat Ferrel <[email protected]> wrote:
> 
>> Assuming Ted needs to call it, not sure if an invite has gone out, I
>> haven't seen one.
>> 
>> On Aug 2, 2013, at 12:49 PM, B Lyon <[email protected]> wrote:
>> 
>> I am planning on sitting in as flaky connection allows.
>> On Aug 2, 2013 3:21 PM, "Pat Ferrel" <[email protected]> wrote:
>> 
>>> We doing a hangout at 2 on the Solr recommender?
>>> 
>> 
>> 
> 
> 

Reply via email to