On 11/13/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:
I'm also envisioning using Solr to replace a database in some web apps
Yes, querying only the search collection rather than both the search collection and the database can make a lot of sense: a less complicated webapp, and you only need to make the search collection HA.
- but how would you handle (or rather simulate) joins in such a case?
The usual approach is to denormalize the data. The downside is a slightly bigger search collection.
Say you have a Book which references an Author in a separate Solr <document> - how do you suggest inserting the Author's data into each Book like an SQL join would do?
Is it possible to make the collection book centric and put the author's data into each book during indexing?
Is it efficient to do a new Lucene query for each Book found, to get the Author? I can imagine doing that in a loop, and Solr's caches would probably help. But how does that feel from Lucene's point of view?
It's doable. The only advantage is decreased index size, but you give up some query power and speed.
This wouldn't be a full join, as there's probably no way to do a single query like select * from Book,Author where Book.author_id = Author.author_id and Author.name like '%chill%"
DB type joints would probably take a *lot* of work. Another downside is the potential for federated or distributed search in the future. Joins go across documents and are thus not easily distributed.
Being able to do this would be cool, but at this point I'm only thinking of retrieving related info linked via IDs.
Trying to think of a URL friendly syntax for this that would work for including fields from more than one other "table"... something like: addFields=artist_name where artist_id:song_artist addFields=album_name,album_date where album_id:song_album I'm still not sure if it's a good idea or not though... you give up powerful queries like +song_title:foo +album_date:[1970 TO 1980] -artist_name:bob -Yonik