Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Rick Leir
On Fri, Dec 4, 2015 at 12:59 AM, 
wrote:

>
> >Just wondering if folks have any suggestions on using Schema.xml vs.
> >Managed Schema going forward.
> >


We are using loosely typed languages (Perl and Javascript), and a loosely
typed DB (CouchDB). This is consistent with running Solr in Schemaless
mode, and doing more unit tests. When you post a doc into Solr containing a
field which has not been seen before, Solr chooses the most appropriate
Type. There is no Java exception and the field data is searchable. You can
discover the Type by looking at the Solr console. We can probably log it
too.

The new field might be due to us intentionally adding it, though we should
be methodical and systematic about adding new fields.

Or it could be due to unexpected input to the ingest scripts, (but I
believe these scripts should clean their inputs).

Or it could be due to a bug in the ingest scripts. In the spirit of TDD,
the ingest scripts should have tests so we can claim they are bug free.


However, I brought up this topic with my colleagues here, and they are sure
we should stick with Schema.xml. ".. some level of control and expectation
of exactly what kind of data is in our search system wouldn't be helpful
.." So be it.
Cheers -- Rick


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Erick Erickson
Actually, I rather agree with your colleagues, but then I'm something
of a curmudgeon.

More accurately, unless you _strictly_ control the input documents,
you never know what you have in your index. I'd rather have docs fail
indexing than be indexed with, say, typos in the field names

FWIW,
Erick

On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir  wrote:
> On Fri, Dec 4, 2015 at 12:59 AM, 
> wrote:
>
>>
>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>> >Managed Schema going forward.
>> >
>
>
> We are using loosely typed languages (Perl and Javascript), and a loosely
> typed DB (CouchDB). This is consistent with running Solr in Schemaless
> mode, and doing more unit tests. When you post a doc into Solr containing a
> field which has not been seen before, Solr chooses the most appropriate
> Type. There is no Java exception and the field data is searchable. You can
> discover the Type by looking at the Solr console. We can probably log it
> too.
>
> The new field might be due to us intentionally adding it, though we should
> be methodical and systematic about adding new fields.
>
> Or it could be due to unexpected input to the ingest scripts, (but I
> believe these scripts should clean their inputs).
>
> Or it could be due to a bug in the ingest scripts. In the spirit of TDD,
> the ingest scripts should have tests so we can claim they are bug free.
>
>
> However, I brought up this topic with my colleagues here, and they are sure
> we should stick with Schema.xml. ".. some level of control and expectation
> of exactly what kind of data is in our search system wouldn't be helpful
> .." So be it.
> Cheers -- Rick


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Alexandre Rafalovitch
Not that hard to setup a cron and diff job and email when the diff is
not-empty. A sort-of "is that what you expected" report.

But, for myself, I also prefer schema and then managed. I do not like
schemaless mode, even for development. Instead, I prefer to do
"dynamicField *".

P.s. I am thinking of doing a video/webinar show-casing the RAD method
based on the dynamicField *, as I see many people really do not get
the workflow around it. If that's something people are interested in,
let me know directly and/or subscribe to the newsletter at
http://www.solr-start.com/ for an announcement. I'll treat the
subscriptions over the next 24 hours as a vote :-)


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 4 December 2015 at 15:15, Erick Erickson  wrote:
> Actually, I rather agree with your colleagues, but then I'm something
> of a curmudgeon.
>
> More accurately, unless you _strictly_ control the input documents,
> you never know what you have in your index. I'd rather have docs fail
> indexing than be indexed with, say, typos in the field names
>
> FWIW,
> Erick
>
> On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir  wrote:
>> On Fri, Dec 4, 2015 at 12:59 AM, 
>> wrote:
>>
>>>
>>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>>> >Managed Schema going forward.
>>> >
>>
>>
>> We are using loosely typed languages (Perl and Javascript), and a loosely
>> typed DB (CouchDB). This is consistent with running Solr in Schemaless
>> mode, and doing more unit tests. When you post a doc into Solr containing a
>> field which has not been seen before, Solr chooses the most appropriate
>> Type. There is no Java exception and the field data is searchable. You can
>> discover the Type by looking at the Solr console. We can probably log it
>> too.
>>
>> The new field might be due to us intentionally adding it, though we should
>> be methodical and systematic about adding new fields.
>>
>> Or it could be due to unexpected input to the ingest scripts, (but I
>> believe these scripts should clean their inputs).
>>
>> Or it could be due to a bug in the ingest scripts. In the spirit of TDD,
>> the ingest scripts should have tests so we can claim they are bug free.
>>
>>
>> However, I brought up this topic with my colleagues here, and they are sure
>> we should stick with Schema.xml. ".. some level of control and expectation
>> of exactly what kind of data is in our search system wouldn't be helpful
>> .." So be it.
>> Cheers -- Rick


RE: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Davis, Daniel (NIH/NLM) [C]
So, I actually went to an Elastic Search one day conference.   One person spoke 
about having to re-index everything because they had their field mappings 
wrong.   I've also worked on Linked Data, RDF, where the fact that everything 
is a triple is supposed to make SQL schemas unneeded.

The theme with Elastic Search was:
 - spend some time on your field mappings (which are a schema) up front.
 - if you don't, you are either going to be wasting space, or experiencing slow 
search, or both.

The theme with RDF was:
 - First model your vocabulary and make sure it answers the questions you want 
to answer.

So, we can be "schemaless", but with both Linked Data and ES, it is a way to 
get started quickly - there are still advantages to using a schema.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 04, 2015 3:16 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Actually, I rather agree with your colleagues, but then I'm something of a 
curmudgeon.

More accurately, unless you _strictly_ control the input documents, you never 
know what you have in your index. I'd rather have docs fail indexing than be 
indexed with, say, typos in the field names

FWIW,
Erick

On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir <richard.l...@canadiana.ca> wrote:
> On Fri, Dec 4, 2015 at 12:59 AM, 
> <solr-user-digest-h...@lucene.apache.org>
> wrote:
>
>>
>> >Just wondering if folks have any suggestions on using Schema.xml vs.
>> >Managed Schema going forward.
>> >
>
>
> We are using loosely typed languages (Perl and Javascript), and a 
> loosely typed DB (CouchDB). This is consistent with running Solr in 
> Schemaless mode, and doing more unit tests. When you post a doc into 
> Solr containing a field which has not been seen before, Solr chooses 
> the most appropriate Type. There is no Java exception and the field 
> data is searchable. You can discover the Type by looking at the Solr 
> console. We can probably log it too.
>
> The new field might be due to us intentionally adding it, though we 
> should be methodical and systematic about adding new fields.
>
> Or it could be due to unexpected input to the ingest scripts, (but I 
> believe these scripts should clean their inputs).
>
> Or it could be due to a bug in the ingest scripts. In the spirit of 
> TDD, the ingest scripts should have tests so we can claim they are bug free.
>
>
> However, I brought up this topic with my colleagues here, and they are 
> sure we should stick with Schema.xml. ".. some level of control and 
> expectation of exactly what kind of data is in our search system 
> wouldn't be helpful .." So be it.
> Cheers -- Rick


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Upayavira
This is exactly right. Schemaless can be a great discovery tool, but not
something it is useful to use in production, I'd say.

On Fri, Dec 4, 2015, at 08:21 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> So, I actually went to an Elastic Search one day conference.   One person
> spoke about having to re-index everything because they had their field
> mappings wrong.   I've also worked on Linked Data, RDF, where the fact
> that everything is a triple is supposed to make SQL schemas unneeded.
> 
> The theme with Elastic Search was:
>  - spend some time on your field mappings (which are a schema) up front.
>  - if you don't, you are either going to be wasting space, or
>  experiencing slow search, or both.
> 
> The theme with RDF was:
>  - First model your vocabulary and make sure it answers the questions you
>  want to answer.
> 
> So, we can be "schemaless", but with both Linked Data and ES, it is a way
> to get started quickly - there are still advantages to using a schema.
> 
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Friday, December 04, 2015 3:16 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?
> 
> Actually, I rather agree with your colleagues, but then I'm something of
> a curmudgeon.
> 
> More accurately, unless you _strictly_ control the input documents, you
> never know what you have in your index. I'd rather have docs fail
> indexing than be indexed with, say, typos in the field names
> 
> FWIW,
> Erick
> 
> On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir <richard.l...@canadiana.ca>
> wrote:
> > On Fri, Dec 4, 2015 at 12:59 AM, 
> > <solr-user-digest-h...@lucene.apache.org>
> > wrote:
> >
> >>
> >> >Just wondering if folks have any suggestions on using Schema.xml vs.
> >> >Managed Schema going forward.
> >> >
> >
> >
> > We are using loosely typed languages (Perl and Javascript), and a 
> > loosely typed DB (CouchDB). This is consistent with running Solr in 
> > Schemaless mode, and doing more unit tests. When you post a doc into 
> > Solr containing a field which has not been seen before, Solr chooses 
> > the most appropriate Type. There is no Java exception and the field 
> > data is searchable. You can discover the Type by looking at the Solr 
> > console. We can probably log it too.
> >
> > The new field might be due to us intentionally adding it, though we 
> > should be methodical and systematic about adding new fields.
> >
> > Or it could be due to unexpected input to the ingest scripts, (but I 
> > believe these scripts should clean their inputs).
> >
> > Or it could be due to a bug in the ingest scripts. In the spirit of 
> > TDD, the ingest scripts should have tests so we can claim they are bug free.
> >
> >
> > However, I brought up this topic with my colleagues here, and they are 
> > sure we should stick with Schema.xml. ".. some level of control and 
> > expectation of exactly what kind of data is in our search system 
> > wouldn't be helpful .." So be it.
> > Cheers -- Rick


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Shawn Heisey
On 12/3/2015 8:09 AM, Kelly, Frank wrote:
> Just wondering if folks have any suggestions on using Schema.xml vs. Managed 
> Schema going forward.
> 
> Our deployment will be
>> 3 Zk, 3 Shards, 3 replicas
>> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
>> Planning at least 1 Billion objects indexed (currently < 100 million)
> 
> I'm sure our schema.xml will have changes and fixes and just wondering which 
> approach (schema.xml vs. managed)
> will be easier to deploy / maintain?

In production, you probably want a schema that cannot change.  The
managed schema that you find in the data-driven configuration will
automatically add new fields to the schema if unknown fields are
encountered in your data ... which means that if somehow a typo makes it
through your indexing process, you may not know about the problem until
later.

With a static schema, an indexing request that has an error in a field
name will be rejected and you will receive an error, which is how I
would want Solr to behave.

The data-driven schema is good for prototyping, but because the field
definitons that get added are just a guess by Solr, I would manually
edit the schema before going into production.  Once in production I
would want to be in complete manual control of the schema.

Thanks,
Shawn



Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Erick Erickson
Shawn:

Managed schema is _used_ by "schemaless", but not the same thing at
all. For "schemaless" (i.e. "data driven"), you need to include the
update processor chains that do the guessing for you and makes use of
the managed veatures to add fields to your schema.

You can also use a managed schema _without_ the processor chains that
enable the "schemaless" update chains. In this you do have a static
schema, with the caveat that "static" means that anyone who can post
directly to Solr can change your schema, but if you allow that someone
issuing managed schema API calls is the least of your worries ;).

That said, I certainly understand wanting to lock down my schema, but
then I'm a control freak.

Best,
Erick



On Thu, Dec 3, 2015 at 7:25 PM, Shawn Heisey  wrote:
> On 12/3/2015 8:09 AM, Kelly, Frank wrote:
>> Just wondering if folks have any suggestions on using Schema.xml vs. Managed 
>> Schema going forward.
>>
>> Our deployment will be
>>> 3 Zk, 3 Shards, 3 replicas
>>> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
>>> Planning at least 1 Billion objects indexed (currently < 100 million)
>>
>> I'm sure our schema.xml will have changes and fixes and just wondering which 
>> approach (schema.xml vs. managed)
>> will be easier to deploy / maintain?
>
> In production, you probably want a schema that cannot change.  The
> managed schema that you find in the data-driven configuration will
> automatically add new fields to the schema if unknown fields are
> encountered in your data ... which means that if somehow a typo makes it
> through your indexing process, you may not know about the problem until
> later.
>
> With a static schema, an indexing request that has an error in a field
> name will be rejected and you will receive an error, which is how I
> would want Solr to behave.
>
> The data-driven schema is good for prototyping, but because the field
> definitons that get added are just a guess by Solr, I would manually
> edit the schema before going into production.  Once in production I
> would want to be in complete manual control of the schema.
>
> Thanks,
> Shawn
>


Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Kelly, Frank
Just wondering if folks have any suggestions on using Schema.xml vs. Managed 
Schema going forward.

Our deployment will be
> 3 Zk, 3 Shards, 3 replicas
> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
> Planning at least 1 Billion objects indexed (currently < 100 million)

I'm sure our schema.xml will have changes and fixes and just wondering which 
approach (schema.xml vs. managed)
will be easier to deploy / maintain?

Cheers!

-Frank


Frank Kelly
Principal Software Engineer
Predictive Analytics Team (SCBE/HAC/CDA)










Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Jeff Wartes
I’ve never used the managed schema, so I’m probably biased, but I’ve never
seen much of a point to the Schema API.

I need to make changes sometimes to solrconfig.xml, in addition to
schema.xml and other config files, and there’s no API for those, so my
process has been like:

1. Put the entire config directory used by a collection in source control
somewhere. solrconfig.xml, schema.xml, synonyms.txt, everything.
2. Make changes, test, commit
3. “Release” by uploading the whole config dir at a specific commit to ZK
(overwriting any existing files) and issuing a collections API “reload”.


This has the downside that I can upload a broken config and take down my
collection, but with the whole config dir in source control,
I can also easily roll back to any point by uploading an old commit.
You still have to be aware of how the changes you’re making will effect
your current index, but that’s unavoidable.


On 12/3/15, 7:09 AM, "Kelly, Frank"  wrote:

>Just wondering if folks have any suggestions on using Schema.xml vs.
>Managed Schema going forward.
>
>Our deployment will be
>> 3 Zk, 3 Shards, 3 replicas
>> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
>> Planning at least 1 Billion objects indexed (currently < 100 million)
>
>I'm sure our schema.xml will have changes and fixes and just wondering
>which approach (schema.xml vs. managed)
>will be easier to deploy / maintain?
>
>Cheers!
>
>-Frank
>
>
>Frank Kelly
>Principal Software Engineer
>Predictive Analytics Team (SCBE/HAC/CDA)
>
>
>
>
>
>
>
>



Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Erick Erickson
It Depends (tm).

Managed Schema is way cool if you have a front end that lets you
manipulate the schema via a browser or other program. There's really
no other way to deal with changing the schema from a browser without
allowing uploading xml files, which is a security problem. Trust me on
this one ;).

For people who know the ins and outs of schema.xml, it's often easier
just to edit the raw file and upload it to ZK (or use it locally). And
much faster for mass edits.

So really they're different beasts. The end result is functionally the
same, there's a schema that's read by Solr and used. The managed
schema makes it harder to have typos sneak in and prevent collections
from loading at the expense of fast mass editing.

And there is some ability to change the solrconfig.xml file, see:
https://cwiki.apache.org/confluence/display/solr/Config+API. But again
whether you "should" use that or just manually edit solrconfig.xml is
largely a matter of the tools available and personal taste.


bq: will be easier to deploy / maintain


Not a lot of difference here. At the end of the day, you have to
1> have the configs stored somewhere safely in version control (or at
least I think you must)
2> change the files in the config set on Zookeeper
3> reload the collection.

So with manually editing the process to change something you'd
1> get the files from VCS
2> edit them
3> push them to ZK
4> reload the collection (collections API) and verify it was correct
5> save the configs back to VCS.

With managed schema you'd
1> use the managed schema API to make changes
2> reload the collection and verify
3> pull the changes from Zookeeper
4> put them in VCS


Best,
Erick



On Thu, Dec 3, 2015 at 12:09 PM, Don Bosco Durai  wrote:
> My experience is, once managed-schema is created, then schema.xml even if 
> present is ignored. When both are present, you will get a warning in the Solr 
> log.
>
> I have stopped using schema.xml. Actually, I use it once, start Solr and 
> after it generates managed-schema, I export it and pretty much just update it 
> going forward.
>
> I think, the recommended way to manage fields is using API calls, but it 
> might not be always possible. E.g. You have to save the config in source code 
> system. If you are doing that, make sure you to update it more regularly, 
> because if Solr finds a new field name, it will auto create it in the 
> managed-schema and you saved copy will be out of date.
>
> Bosco
>
>
>
>
> On 12/3/15, 11:47 AM, "Jeff Wartes"  wrote:
>
>>I’ve never used the managed schema, so I’m probably biased, but I’ve never
>>seen much of a point to the Schema API.
>>
>>I need to make changes sometimes to solrconfig.xml, in addition to
>>schema.xml and other config files, and there’s no API for those, so my
>>process has been like:
>>
>>1. Put the entire config directory used by a collection in source control
>>somewhere. solrconfig.xml, schema.xml, synonyms.txt, everything.
>>2. Make changes, test, commit
>>3. “Release” by uploading the whole config dir at a specific commit to ZK
>>(overwriting any existing files) and issuing a collections API “reload”.
>>
>>
>>This has the downside that I can upload a broken config and take down my
>>collection, but with the whole config dir in source control,
>>I can also easily roll back to any point by uploading an old commit.
>>You still have to be aware of how the changes you’re making will effect
>>your current index, but that’s unavoidable.
>>
>>
>>On 12/3/15, 7:09 AM, "Kelly, Frank"  wrote:
>>
>>>Just wondering if folks have any suggestions on using Schema.xml vs.
>>>Managed Schema going forward.
>>>
>>>Our deployment will be
 3 Zk, 3 Shards, 3 replicas
 Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
 Planning at least 1 Billion objects indexed (currently < 100 million)
>>>
>>>I'm sure our schema.xml will have changes and fixes and just wondering
>>>which approach (schema.xml vs. managed)
>>>will be easier to deploy / maintain?
>>>
>>>Cheers!
>>>
>>>-Frank
>>>
>>>
>>>Frank Kelly
>>>Principal Software Engineer
>>>Predictive Analytics Team (SCBE/HAC/CDA)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Upayavira
They are different beasts, but I bet on the managed schema winning in
the long run.

With the bulk API, you can post a heap of fields/etc in one go, so
basically, rather than pushing the schema to Zookeeper, you push it to
Solr. 

Look at Solr 5.4 when it comes out shortly. It'll change the way you
think about the schema. The managed schema has been there for ages, but
now the UI has support for it in the schema tab. Being able to really
easily create and remove fields certainly does things to my brain
because it is just so easy.

Upayavira

On Thu, Dec 3, 2015, at 08:35 PM, Erick Erickson wrote:
> It Depends (tm).
> 
> Managed Schema is way cool if you have a front end that lets you
> manipulate the schema via a browser or other program. There's really
> no other way to deal with changing the schema from a browser without
> allowing uploading xml files, which is a security problem. Trust me on
> this one ;).
> 
> For people who know the ins and outs of schema.xml, it's often easier
> just to edit the raw file and upload it to ZK (or use it locally). And
> much faster for mass edits.
> 
> So really they're different beasts. The end result is functionally the
> same, there's a schema that's read by Solr and used. The managed
> schema makes it harder to have typos sneak in and prevent collections
> from loading at the expense of fast mass editing.
> 
> And there is some ability to change the solrconfig.xml file, see:
> https://cwiki.apache.org/confluence/display/solr/Config+API. But again
> whether you "should" use that or just manually edit solrconfig.xml is
> largely a matter of the tools available and personal taste.
> 
> 
> bq: will be easier to deploy / maintain
> 
> 
> Not a lot of difference here. At the end of the day, you have to
> 1> have the configs stored somewhere safely in version control (or at
> least I think you must)
> 2> change the files in the config set on Zookeeper
> 3> reload the collection.
> 
> So with manually editing the process to change something you'd
> 1> get the files from VCS
> 2> edit them
> 3> push them to ZK
> 4> reload the collection (collections API) and verify it was correct
> 5> save the configs back to VCS.
> 
> With managed schema you'd
> 1> use the managed schema API to make changes
> 2> reload the collection and verify
> 3> pull the changes from Zookeeper
> 4> put them in VCS
> 
> 
> Best,
> Erick
> 
> 
> 
> On Thu, Dec 3, 2015 at 12:09 PM, Don Bosco Durai 
> wrote:
> > My experience is, once managed-schema is created, then schema.xml even if 
> > present is ignored. When both are present, you will get a warning in the 
> > Solr log.
> >
> > I have stopped using schema.xml. Actually, I use it once, start Solr and 
> > after it generates managed-schema, I export it and pretty much just update 
> > it going forward.
> >
> > I think, the recommended way to manage fields is using API calls, but it 
> > might not be always possible. E.g. You have to save the config in source 
> > code system. If you are doing that, make sure you to update it more 
> > regularly, because if Solr finds a new field name, it will auto create it 
> > in the managed-schema and you saved copy will be out of date.
> >
> > Bosco
> >
> >
> >
> >
> > On 12/3/15, 11:47 AM, "Jeff Wartes"  wrote:
> >
> >>I’ve never used the managed schema, so I’m probably biased, but I’ve never
> >>seen much of a point to the Schema API.
> >>
> >>I need to make changes sometimes to solrconfig.xml, in addition to
> >>schema.xml and other config files, and there’s no API for those, so my
> >>process has been like:
> >>
> >>1. Put the entire config directory used by a collection in source control
> >>somewhere. solrconfig.xml, schema.xml, synonyms.txt, everything.
> >>2. Make changes, test, commit
> >>3. “Release” by uploading the whole config dir at a specific commit to ZK
> >>(overwriting any existing files) and issuing a collections API “reload”.
> >>
> >>
> >>This has the downside that I can upload a broken config and take down my
> >>collection, but with the whole config dir in source control,
> >>I can also easily roll back to any point by uploading an old commit.
> >>You still have to be aware of how the changes you’re making will effect
> >>your current index, but that’s unavoidable.
> >>
> >>
> >>On 12/3/15, 7:09 AM, "Kelly, Frank"  wrote:
> >>
> >>>Just wondering if folks have any suggestions on using Schema.xml vs.
> >>>Managed Schema going forward.
> >>>
> >>>Our deployment will be
>  3 Zk, 3 Shards, 3 replicas
>  Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
>  Planning at least 1 Billion objects indexed (currently < 100 million)
> >>>
> >>>I'm sure our schema.xml will have changes and fixes and just wondering
> >>>which approach (schema.xml vs. managed)
> >>>will be easier to deploy / maintain?
> >>>
> >>>Cheers!
> >>>
> >>>-Frank
> >>>
> >>>
> >>>Frank Kelly
> >>>Principal Software Engineer
> 

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Don Bosco Durai
My experience is, once managed-schema is created, then schema.xml even if 
present is ignored. When both are present, you will get a warning in the Solr 
log.

I have stopped using schema.xml. Actually, I use it once, start Solr and after 
it generates managed-schema, I export it and pretty much just update it going 
forward. 

I think, the recommended way to manage fields is using API calls, but it might 
not be always possible. E.g. You have to save the config in source code system. 
If you are doing that, make sure you to update it more regularly, because if 
Solr finds a new field name, it will auto create it in the managed-schema and 
you saved copy will be out of date.

Bosco




On 12/3/15, 11:47 AM, "Jeff Wartes"  wrote:

>I’ve never used the managed schema, so I’m probably biased, but I’ve never
>seen much of a point to the Schema API.
>
>I need to make changes sometimes to solrconfig.xml, in addition to
>schema.xml and other config files, and there’s no API for those, so my
>process has been like:
>
>1. Put the entire config directory used by a collection in source control
>somewhere. solrconfig.xml, schema.xml, synonyms.txt, everything.
>2. Make changes, test, commit
>3. “Release” by uploading the whole config dir at a specific commit to ZK
>(overwriting any existing files) and issuing a collections API “reload”.
>
>
>This has the downside that I can upload a broken config and take down my
>collection, but with the whole config dir in source control,
>I can also easily roll back to any point by uploading an old commit.
>You still have to be aware of how the changes you’re making will effect
>your current index, but that’s unavoidable.
>
>
>On 12/3/15, 7:09 AM, "Kelly, Frank"  wrote:
>
>>Just wondering if folks have any suggestions on using Schema.xml vs.
>>Managed Schema going forward.
>>
>>Our deployment will be
>>> 3 Zk, 3 Shards, 3 replicas
>>> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
>>> Planning at least 1 Billion objects indexed (currently < 100 million)
>>
>>I'm sure our schema.xml will have changes and fixes and just wondering
>>which approach (schema.xml vs. managed)
>>will be easier to deploy / maintain?
>>
>>Cheers!
>>
>>-Frank
>>
>>
>>Frank Kelly
>>Principal Software Engineer
>>Predictive Analytics Team (SCBE/HAC/CDA)
>>
>>
>>
>>
>>
>>
>>
>>
>