RE: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Davis, Daniel (NIH/NLM) [C] Fri, 04 Dec 2015 12:21:55 -0800

So, I actually went to an Elastic Search one day conference.   One person spoke 
about having to re-index everything because they had their field mappings 
wrong.   I've also worked on Linked Data, RDF, where the fact that everything 
is a triple is supposed to make SQL schemas unneeded.


The theme with Elastic Search was:
 - spend some time on your field mappings (which are a schema) up front.
 - if you don't, you are either going to be wasting space, or experiencing slow 
search, or both.

The theme with RDF was:
 - First model your vocabulary and make sure it answers the questions you want 
to answer.

So, we can be "schemaless", but with both Linked Data and ES, it is a way to 
get started quickly - there are still advantages to using a schema.

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 04, 2015 3:16 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Actually, I rather agree with your colleagues, but then I'm something of a 
curmudgeon.

More accurately, unless you _strictly_ control the input documents, you never 
know what you have in your index. I'd rather have docs fail indexing than be 
indexed with, say, typos in the field names....

FWIW,
Erick

On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir <richard.l...@canadiana.ca> wrote:
> On Fri, Dec 4, 2015 at 12:59 AM, 
> <solr-user-digest-h...@lucene.apache.org>
> wrote:
>
>>
>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>> >Managed Schema going forward.
>> >
>
>
> We are using loosely typed languages (Perl and Javascript), and a 
> loosely typed DB (CouchDB). This is consistent with running Solr in 
> Schemaless mode, and doing more unit tests. When you post a doc into 
> Solr containing a field which has not been seen before, Solr chooses 
> the most appropriate Type. There is no Java exception and the field 
> data is searchable. You can discover the Type by looking at the Solr 
> console. We can probably log it too.
>
> The new field might be due to us intentionally adding it, though we 
> should be methodical and systematic about adding new fields.
>
> Or it could be due to unexpected input to the ingest scripts, (but I 
> believe these scripts should clean their inputs).
>
> Or it could be due to a bug in the ingest scripts. In the spirit of 
> TDD, the ingest scripts should have tests so we can claim they are bug free.
>
>
> However, I brought up this topic with my colleagues here, and they are 
> sure we should stick with Schema.xml. ".. some level of control and 
> expectation of exactly what kind of data is in our search system 
> wouldn't be helpful .." So be it.
> Cheers -- Rick

RE: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Reply via email to