RE: Storing data in Solr

2017-08-08 Thread Phil Scadden
When I am putting PDF documents and rows from a table into the same index, I 
create "dataSource" field to identify the source and I don't copy database 
fields - only index them - apart from the unique key which is stored as 
"document". On search, you process the output before passing to user. If 
datasource is pdfs etc, then you should have highlighted text to pass on. If 
dataSource is the table, then fetch the rows from database and display the 
search fields as "highlights". A lot of postprocessing of search results but 
easier to create meaningful results if a single row in the table contains what 
a user wants. You need a custom indexer and a custom results postprocesser 
however.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Storing data in Solr

2017-08-07 Thread Erick Erickson
Well, a very common pattern is to use Solr to search, storing just enough
in each field (stored="true") to return to the user search results that
give enough information to determine whether they want to look at the original
document. When the click on a choice (or a link like "download PDF") then
fetch the actual file from the system of record.

You'll have to re-index sometime anyway as your requirements change and
you have to re-ingest all your data and that's easiest from the system
of record.

Best,
Erick Erickson

On Mon, Aug 7, 2017 at 8:05 PM, sg1973  wrote:
> I have written the code to publish to Solr but i am wondering what is the
> right way to do it. Is directly putting data in Solr OK or putting it in a
> separate cache and then building solr on top of it? what are the pros and
> cons of each?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Storing-data-in-Solr-tp4349537p4349541.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing data in Solr

2017-08-07 Thread sg1973
I have written the code to publish to Solr but i am wondering what is the
right way to do it. Is directly putting data in Solr OK or putting it in a
separate cache and then building solr on top of it? what are the pros and
cons of each?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-data-in-Solr-tp4349537p4349541.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing data in Solr

2017-08-07 Thread Deepak Vohra
Solr indexes data for search and if search is the main criterion Solr should be 
used. 

On Mon, 8/7/17, sg1973  wrote:

 Subject: Storing data in Solr
 To: solr-user@lucene.apache.org
 Received: Monday, August 7, 2017, 6:55 PM
 
 Hello All,
 I am new to Solr and have a question. I
 have to load about 1 million records
 from a DB table (with say 30
 columns/row) and then run various search
 queries on it. I see 2 ways to do it.
 Store the data directly in Solr versus
 store in in a cache and then search on
 it using Solr. I am trying to
 understand which approach is better and
 recommended. One use case where I
 would need a separate cache is when I
 have to store non-linear data (PDF et
 al) which won't be supported by Solr.
 However, if i have tabulated data then
 i have a choice to store directly in
 Solr. Any ideas on what to choose when?
 Is there a reason i would choose a
 separate cache even for storing linear
 data?
 
 Thanks in advance
 PG
 
 
 
 --
 View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-data-in-Solr-tp4349537.html
 Sent from the Solr - User mailing list
 archive at Nabble.com.
 


Re: Storing data in Solr

2017-08-07 Thread Deepak Vohra
Which database is to be integrated? Solr provides Data Import Handlers for 
several databases including Oracle and MySQL.


On Mon, 8/7/17, sg1973  wrote:

 Subject: Storing data in Solr
 To: solr-user@lucene.apache.org
 Received: Monday, August 7, 2017, 6:55 PM
 
 Hello All,
 I am new to Solr and have a question. I
 have to load about 1 million records
 from a DB table (with say 30
 columns/row) and then run various search
 queries on it. I see 2 ways to do it.
 Store the data directly in Solr versus
 store in in a cache and then search on
 it using Solr. I am trying to
 understand which approach is better and
 recommended. One use case where I
 would need a separate cache is when I
 have to store non-linear data (PDF et
 al) which won't be supported by Solr.
 However, if i have tabulated data then
 i have a choice to store directly in
 Solr. Any ideas on what to choose when?
 Is there a reason i would choose a
 separate cache even for storing linear
 data?
 
 Thanks in advance
 PG
 
 
 
 --
 View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-data-in-Solr-tp4349537.html
 Sent from the Solr - User mailing list
 archive at Nabble.com.