"Staging" the data in a non-Solr store sounds like a potentially reasonable idea to me. You might want to consider a NoSQL store of some kind like MongoDB perhaps, instead of an rdbms.
The way to think about Solr is not as a store or a database -- it's an index for serving your application. That's also the way to think about how to get your multiple tables in there -- denormalize, denormalize, denormalize. You need to think about what you actually need to search over, and build your index to serve that efficiently, rather than thinking about normalization or data modelling the way we are used to with rdbms's, it's a different way of thinking. A Solr index basically gives you one collection of documents. But the documents can all have different fields -- so you _could_ (but probably don't want to) essentially put all your tables in there with unique fields --they're all in the same index, they're all just "documents", but some have a table1_title and table1_author, and others have no data in those fields but a table2_productName and a table2_price. Then if you want to query on just one type of thing, you just query on those fields. Except... you don't get any joins. Which is why you probably don't want to do that after all, it probably won't serve your needs. Figuring out the right way to model your data in Solr can be tricky, and it is sometimes hard to do exactly what you want. Solr isn't an rdbms, and in some ways isn't as powerful as an rdbms -- in the sense of being as flexible with what kinds of queries you can run on any given data. What it does is give you very fast access to inverted index lookups and set combinations and facetting that would be very hard to do efficiently in an rdbms. It is a trade-off. But there's not really a general answer to "how do I take these dozen rdbms tables and store them in Solr the best way?" -- it depends on what kinds of searching you need to support and the nature of your data. ________________________________________ From: Sharma, Raghvendra [sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 2:15 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -----Original Message----- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a view from your database. The rows become Solr documents, the > columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. > > wunder > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > >> I am sure these kind of questions keep coming to you guys, but I want to >> raise the same question in a different context...my own business situation. >> I am very very new to solr and though I have tried to read through the >> documentation, I have nowhere near completing the whole read. >> >> The need is like this - >> >> We have a huge rdbms database/table. A single table perhaps houses 100+ >> million rows. Though oracle is doing a fine job of handling the insertion >> and updation of data, the querying is where our main concerns lie. Since we >> have spatial data, the index building takes hours and hours for such tables. >> >> That's when we thought of moving away from standard rdbms and thought of >> trying something different and fast. >> My last week has been spent in a journey reading through bigtable to hadoop >> to hbase, to hive and then finally landed on solr. As far as I am in my >> tests, it looks pretty good, but I have a few unanswered questions still. >> Trying this group for them :) (I am sure I can find some answers if I >> read/google more on the topic, but now I m being lazy and feel asking the >> people who are already using it/or perhaps developing it is a better bet). >> >> 1. Can I get my solr instance to load data (fresh data for indexing) from a >> stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. >> 2. Can I host my solr instance to use hbase as the database/file system >> (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. >> 3. are there somewhere any reports available (as in benchmarks ) for a solr >> instance's performance ? You can probably search the web for these. I've personally seen several installs w/ 1B+ docs and subsecond search and faceting and heard of others. You might look at the stuff the Hathi trust has put up. >> 4. are there any APIs available which might help me apply ANSI sql kind of >> statements to my solr data ? No. Question back? What kinds of things are you trying to do? >> >> It would be great if people could help share their experience in the area... >> if it's too much trouble writing all of it, perhaps url would be easier... I >> welcome all kinds of help here... any advice/suggestions are good ... >> >> Looking forward to your viewpoints.. >> >> --raghav.. >> ****************************************************************************************** >> This message may contain confidential or proprietary information intended >> only for the use of the >> addressee(s) named above or may contain information that is legally >> privileged. If you are >> not the intended addressee, or the person responsible for delivering it to >> the intended addressee, >> you are hereby notified that reading, disseminating, distributing or copying >> this message is strictly >> prohibited. If you have received this message by mistake, please immediately >> notify us by >> replying to the message and delete the original message and any copies >> immediately thereafter. >> >> Thank you. >> ****************************************************************************************** >> CLLD >> > > > > -------------------------- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8