When I started looking at this I was a bit skeptical. As a Search engine Solr may be peerless, but as yet another NoSQL db?
However getting further into this I see one very large benefit. It has one feature that sets it completely apart from the typical NoSQL db. The type of queries you do return fuzzy results--in the very best sense of that word. The most interesting queries are based on similarity to some exemplar. Results are returned in order of similarity strength, not ordered by a sort field. Wherever similarity based queries are important I'll look at Solr first. SolrJ looks like an interesting way to get Solr queries on POJOs. It's probably at least an alternative to using docs and CSVs to import the data from Mahout. On Aug 12, 2013, at 2:32 PM, Ted Dunning <[email protected]> wrote: Yes. That would be interesting. On Mon, Aug 12, 2013 at 1:25 PM, Gokhan Capan <[email protected]> wrote: > A little digression: Might a Matrix implementation backed by a Solr index > and uses SolrJ for querying help at all for the Solr recommendation > approach? > > It supports multiple fields of String, Text, or boolean flags. > > Best > Gokhan > > > On Wed, Aug 7, 2013 at 9:42 PM, Pat Ferrel <[email protected]> wrote: > >> Also a question about user history. >> >> I was planning to write these into separate directories so Solr could >> fetch them from different sources but it occurs to me that it would be >> better to join A and B by user ID and output a doc per user ID with three >> fields, id, A item history, and B item history. Other fields could be > added >> for users metadata. >> >> Sound correct? This is what I'll do unless someone stops me. >> >> On Aug 7, 2013, at 11:25 AM, Pat Ferrel <[email protected]> wrote: >> >> Once you have a sample or example of what you think the >> "log file" version will look like, can you post it? It would be great to >> have example lines for two actions with or without the same item IDs. > I'll >> make sure we can digest it. >> >> I thought more about the ingest part and I don't think the one-item-space >> is actually a problem. It just means one item dictionary. A and B will > have >> the right content, all I have to do is make sure the right ranks are > input >> to the MM, >> Transpose, and RSJ. This in turn is only one extra count of the # of > items >> in A's item space. This should be a very easy change If my thinking is >> correct. >> >> >> On Aug 7, 2013, at 8:09 AM, Ted Dunning <[email protected]> wrote: >> >> On Tue, Aug 6, 2013 at 7:57 AM, Pat Ferrel <[email protected]> wrote: >> >>> 4) To add more metadata to the Solr output will be left to the consumer >>> for now. If there is a good data set to use we can illustrate how to do >> it >>> in the project. Ted may have some data for this from musicbrainz. >> >> >> I am working on this issue now. >> >> The current state is that I can bring in a bunch of track names and links >> to artist names and so on. This would provide the basic set of items >> (artists, genres, tracks and tags). >> >> There is a hitch in bringing in the data needed to generate the logs > since >> that part of MB is not Apache compatible. I am working on that issue. >> >> Technically, the data is in a massively normalized relational form right >> now, but it isn't terribly hard to denormalize into a form that we need. >> >> >> >
