In writing the similarity matrices to Solr there is a bit of a problem. The Matrices exist in two DRMs. The rows correspond to the doc IDs. As far as I know there is no guarantee that the ids of both matrices are in the same descending order.
The easiest solution is to have an index for [B'B] and one for [B'A]. That means two or perhaps three queries for cross-recommendations, which is not ideal. First I'm going to create two collections of docs with different field ids--this should work and we can merge them later. Next we can do some m/r to group the docs by id so there is one collection (csv) with one line per doc. Alternatively it is a possible that the DRMs can be iterated simultaneously, which would also solve the problem. It assumes the order in both DRMs is the same, descending by Key = item ID. Even if a row is missing in one or the other this would work. Does anyone know if the DRMs are guaranteed to have row ordering by Key? RSJ creates [B'B] and matrix multiply creates [B'A] On Aug 2, 2013, at 11:14 PM, Ted Dunning <[email protected]> wrote: Yes. We need two different sets of documents if the row space of the cross/co-occurrence matrices are different as is the case with A'B and B'B. This could mean two indexes. Or a single index with a special field to indicate what type of record you have. On Fri, Aug 2, 2013 at 2:39 PM, Pat Ferrel <[email protected]> wrote: > Thanks, well put. > > In order to have the ultimate impl with two id spaces for A and B would we > have to create different docs for A'B and B'B? Since the docs IDs must come > from A or B? The fields can contain different sets of IDs but the Doc ID > must be one or the other, right? Doesn't this imply separate indexes for > the separate A, B item IDs spaces? This is not a question for this first > cut impl but is a generalization question. > > On Aug 2, 2013, at 2:06 PM, Ted Dunning <[email protected]> wrote: > > So there is a lot of good discussion here and there were some key ideas. > > The first idea is that the *input* to a recommender is on the right in the > matrix notation. This refers inherently to the id's on the columns of the > recommender product (either B'B or B'A). The columns are defined by the > right hand element of the product (either B or A in the B'B and B'A > respectively). > > The results are in the row space and are defined by the left hand operand > of the product. IN the case of B'A and B'B, the left hand operand is B in > both cases so the row space is consistent. > > In order to implement this in a search engine, we need documents that > correspond to rows of B'A or B'B. These are the same as the columns of B. > The fields of the documents will necessarily include the following: > > id: the column id from B corresponding to this item > description: presentation info ... yada yada > b-a-links: contents of this row of B'A expressed as id's from the column > space of A where this row of llr-filter(B'A) contains a > non-zero value. > b-b-links: contents of this row of B'B expressed as id's from the column > space of B ... > > > The following operations are now single queries: > > get an item where id = x > query is [id:x] > > recommend based on behavior with regard to A items and actions h_a > query is [b-a-links: h_a] > > recommend based on behavior with regard to B items and actions h_b > query is [b-b-links: h_b] > > recommend based on a single item with id = x > query is [b-b-links: x] > > recommend based on composite behavior composed of h_a and h_b > query is [b-a-links: h_a b-b-links: h_b] > > Does this make sense by being more explicit? > > Now, it is pretty clear that we could have an index of A objects as well > but the link fields would have to be a-a-links and a-b-links, of course. > > > > > On Fri, Aug 2, 2013 at 1:25 PM, Pat Ferrel <[email protected]> wrote: > >> Assuming Ted needs to call it, not sure if an invite has gone out, I >> haven't seen one. >> >> On Aug 2, 2013, at 12:49 PM, B Lyon <[email protected]> wrote: >> >> I am planning on sitting in as flaky connection allows. >> On Aug 2, 2013 3:21 PM, "Pat Ferrel" <[email protected]> wrote: >> >>> We doing a hangout at 2 on the Solr recommender? >>> >> >> > >
