A quick map-reduce program should be able to join these matrices and
produce documents ready to index.


On Mon, Aug 5, 2013 at 10:10 AM, Pat Ferrel <[email protected]> wrote:

> In writing the similarity matrices to Solr there is a bit of a problem.
> The Matrices exist in two DRMs. The rows correspond to the doc IDs. As far
> as I know there is no guarantee that the ids of both matrices are in the
> same descending order.
>
> The easiest solution is to have an index for [B'B] and one for [B'A]. That
> means two or perhaps three queries for cross-recommendations, which is not
> ideal.
>
> First I'm going to create two collections of docs with different field
> ids--this should work and we can merge them later.
>
> Next we can do some m/r to group the docs by id so there is one collection
> (csv) with one line per doc.
>
> Alternatively it is a possible that the DRMs can be iterated
> simultaneously, which would also solve the problem. It assumes the order in
> both DRMs is the same, descending by Key = item ID. Even if a row is
> missing in one or the other this would work.
>
> Does anyone know if the DRMs are guaranteed to have row ordering by Key?
> RSJ creates [B'B] and matrix multiply creates [B'A]
>
>
> On Aug 2, 2013, at 11:14 PM, Ted Dunning <[email protected]> wrote:
>
> Yes.  We need two different sets of documents if the row space of the
> cross/co-occurrence matrices are different as is the case with A'B and B'B.
>
> This could mean two indexes.
>
> Or a single index with a special field to indicate what type of record you
> have.
>
>
> On Fri, Aug 2, 2013 at 2:39 PM, Pat Ferrel <[email protected]> wrote:
>
> > Thanks, well put.
> >
> > In order to have the ultimate impl with two id spaces for A and B would
> we
> > have to create different docs for A'B and B'B? Since the docs IDs must
> come
> > from A or B? The fields can contain different sets of IDs but the Doc ID
> > must be one or the other, right? Doesn't this imply separate indexes for
> > the separate A, B item IDs spaces? This is not a question for this first
> > cut impl but is a generalization question.
> >
> > On Aug 2, 2013, at 2:06 PM, Ted Dunning <[email protected]> wrote:
> >
> > So there is a lot of good discussion here and there were some key ideas.
> >
> > The first idea is that the *input* to a recommender is on the right in
> the
> > matrix notation.  This refers inherently to the id's on the columns of
> the
> > recommender product (either B'B or B'A).  The columns are defined by the
> > right hand element of the product (either B or A in the B'B and B'A
> > respectively).
> >
> > The results are in the row space and are defined by the left hand operand
> > of the product.  IN the case of B'A and B'B, the left hand operand is B
> in
> > both cases so the row space is consistent.
> >
> > In order to implement this in a search engine, we need documents that
> > correspond to rows of B'A or B'B.  These are the same as the columns of
> B.
> > The fields of the documents will necessarily include the following:
> >
> > id: the column id from B corresponding to this item
> > description: presentation info ... yada yada
> > b-a-links: contents of this row of B'A expressed as id's from the column
> > space of A where this row                  of llr-filter(B'A) contains a
> > non-zero value.
> > b-b-links: contents of this row of B'B expressed as id's from the column
> > space of B ...
> >
> >
> > The following operations are now single queries:
> >
> > get an item where id = x
> >      query is [id:x]
> >
> > recommend based on behavior with regard to A items and actions h_a
> >      query is [b-a-links: h_a]
> >
> > recommend based on behavior with regard to B items and actions h_b
> >      query is [b-b-links: h_b]
> >
> > recommend based on a single item with id = x
> >       query is [b-b-links: x]
> >
> > recommend based on composite behavior composed of h_a and h_b
> >       query is [b-a-links: h_a b-b-links: h_b]
> >
> > Does this make sense by being more explicit?
> >
> > Now, it is pretty clear that we could have an index of A objects as well
> > but the link fields would have to be a-a-links and a-b-links, of course.
> >
> >
> >
> >
> > On Fri, Aug 2, 2013 at 1:25 PM, Pat Ferrel <[email protected]> wrote:
> >
> >> Assuming Ted needs to call it, not sure if an invite has gone out, I
> >> haven't seen one.
> >>
> >> On Aug 2, 2013, at 12:49 PM, B Lyon <[email protected]> wrote:
> >>
> >> I am planning on sitting in as flaky connection allows.
> >> On Aug 2, 2013 3:21 PM, "Pat Ferrel" <[email protected]> wrote:
> >>
> >>> We doing a hangout at 2 on the Solr recommender?
> >>>
> >>
> >>
> >
> >
>
>

Reply via email to