Re: Solr Managed Schema by Default in 5.5
Data driven mode is different from managed schema. It is unfortunate that in our example configurations we implemented them together. Managed schema is about using APIs to read/write schema changes. Not requiring people to hand edit schema.xml is a good thing, IMO. Data driven schema uses the managed schema infrastructure internally and adds update request processors to create/modify schema depending on what data you throw at Solr. It is a nice mode to play around with Solr but I would only use it for PoCs. I hope that clarifies things. On Fri, Mar 11, 2016 at 10:36 PM, Nick Vasilyevwrote: > Got it. > > Thank you for clarifying this, I was under impression that I would only be > able to make changes via the API. I will look into this some more. > > On Fri, Mar 11, 2016 at 11:51 AM, Shawn Heisey wrote: > >> On 3/11/2016 9:28 AM, Nick Vasilyev wrote: >> > Maybe I am missing something, if that is the case what is the difference >> > between data_driven_schema_configs and basic_configs? I thought that the >> > only difference was that the data_driven_schema_configs comes with the >> > managed schema and the basic_configs come with regular? >> > >> > Also, I haven't really dived into the schema less mode so far, I know >> > elastic uses it and it has been kind of a turn off for me. Can you >> provide >> > some guidance around best practices on how to use it? >> >> Schemaless mode is implemented with an update processor chain. If you >> look in the data_driven_schema_configs solrconfig.xml file, you will >> find an updateRequestProcessorChain named >> "add-unknown-fields-to-the-schema". This update chain is then enabled >> with an initParams config. >> >> I personally would not recommend using it. It would be fine to use >> during prototyping, but I would definitely turn it off for production. >> >> > For example, now I have all of my configuration files in version control, >> > if I need to make a change, I upload a new schema to version control, >> then >> > the server pulls them down, uploads to zk and reloads collections. This >> is >> > almost fully automated and since all configuration is in a single file it >> > is easy to review and track previous changes. I like this process and it >> > works well; if I have to start using managed schemas; I would like some >> > feedback on how to implement it with minimal disruption to this. >> >> There's no reason you can't continue to use this method, even with the >> managed schema. Editing the managed-schema is discouraged if you >> actually intend to use the Schema API, but there's nothing in place to >> prevent you from doing it that way. >> >> > If I am sending all schema changes via the API, I would need to have >> still >> > have some file with the schema configuration, it would just be a >> different >> > format. I would then need to have some code to read it and send specific >> > items to Solr, right? When I need to make a change, do I have to then >> make >> > this change individually and include that configuration as part of the >> > config file? Or should I be able to just send the entire schema in again? >> >> Using the Schema API changes the managed-schema file in place. You >> wouldn't need to upload anything to zookeeper, the change would already >> be there -- but you'd have to take an extra step (retrieving from >> zookeeper) to make sure it's in version control. >> >> My recommendation is to just keep using version control as you have >> been, which you can do with either the Classic or Managed schema. The >> filename for the schema would change with the managed version, but >> nothing else. >> >> Thanks, >> Shawn >> >> -- Regards, Shalin Shekhar Mangar.
Re: Solr Managed Schema by Default in 5.5
Got it. Thank you for clarifying this, I was under impression that I would only be able to make changes via the API. I will look into this some more. On Fri, Mar 11, 2016 at 11:51 AM, Shawn Heiseywrote: > On 3/11/2016 9:28 AM, Nick Vasilyev wrote: > > Maybe I am missing something, if that is the case what is the difference > > between data_driven_schema_configs and basic_configs? I thought that the > > only difference was that the data_driven_schema_configs comes with the > > managed schema and the basic_configs come with regular? > > > > Also, I haven't really dived into the schema less mode so far, I know > > elastic uses it and it has been kind of a turn off for me. Can you > provide > > some guidance around best practices on how to use it? > > Schemaless mode is implemented with an update processor chain. If you > look in the data_driven_schema_configs solrconfig.xml file, you will > find an updateRequestProcessorChain named > "add-unknown-fields-to-the-schema". This update chain is then enabled > with an initParams config. > > I personally would not recommend using it. It would be fine to use > during prototyping, but I would definitely turn it off for production. > > > For example, now I have all of my configuration files in version control, > > if I need to make a change, I upload a new schema to version control, > then > > the server pulls them down, uploads to zk and reloads collections. This > is > > almost fully automated and since all configuration is in a single file it > > is easy to review and track previous changes. I like this process and it > > works well; if I have to start using managed schemas; I would like some > > feedback on how to implement it with minimal disruption to this. > > There's no reason you can't continue to use this method, even with the > managed schema. Editing the managed-schema is discouraged if you > actually intend to use the Schema API, but there's nothing in place to > prevent you from doing it that way. > > > If I am sending all schema changes via the API, I would need to have > still > > have some file with the schema configuration, it would just be a > different > > format. I would then need to have some code to read it and send specific > > items to Solr, right? When I need to make a change, do I have to then > make > > this change individually and include that configuration as part of the > > config file? Or should I be able to just send the entire schema in again? > > Using the Schema API changes the managed-schema file in place. You > wouldn't need to upload anything to zookeeper, the change would already > be there -- but you'd have to take an extra step (retrieving from > zookeeper) to make sure it's in version control. > > My recommendation is to just keep using version control as you have > been, which you can do with either the Classic or Managed schema. The > filename for the schema would change with the managed version, but > nothing else. > > Thanks, > Shawn > >
Re: Solr Managed Schema by Default in 5.5
On 3/11/2016 9:28 AM, Nick Vasilyev wrote: > Maybe I am missing something, if that is the case what is the difference > between data_driven_schema_configs and basic_configs? I thought that the > only difference was that the data_driven_schema_configs comes with the > managed schema and the basic_configs come with regular? > > Also, I haven't really dived into the schema less mode so far, I know > elastic uses it and it has been kind of a turn off for me. Can you provide > some guidance around best practices on how to use it? Schemaless mode is implemented with an update processor chain. If you look in the data_driven_schema_configs solrconfig.xml file, you will find an updateRequestProcessorChain named "add-unknown-fields-to-the-schema". This update chain is then enabled with an initParams config. I personally would not recommend using it. It would be fine to use during prototyping, but I would definitely turn it off for production. > For example, now I have all of my configuration files in version control, > if I need to make a change, I upload a new schema to version control, then > the server pulls them down, uploads to zk and reloads collections. This is > almost fully automated and since all configuration is in a single file it > is easy to review and track previous changes. I like this process and it > works well; if I have to start using managed schemas; I would like some > feedback on how to implement it with minimal disruption to this. There's no reason you can't continue to use this method, even with the managed schema. Editing the managed-schema is discouraged if you actually intend to use the Schema API, but there's nothing in place to prevent you from doing it that way. > If I am sending all schema changes via the API, I would need to have still > have some file with the schema configuration, it would just be a different > format. I would then need to have some code to read it and send specific > items to Solr, right? When I need to make a change, do I have to then make > this change individually and include that configuration as part of the > config file? Or should I be able to just send the entire schema in again? Using the Schema API changes the managed-schema file in place. You wouldn't need to upload anything to zookeeper, the change would already be there -- but you'd have to take an extra step (retrieving from zookeeper) to make sure it's in version control. My recommendation is to just keep using version control as you have been, which you can do with either the Classic or Managed schema. The filename for the schema would change with the managed version, but nothing else. Thanks, Shawn
Re: Solr Managed Schema by Default in 5.5
Hi Shawn, Maybe I am missing something, if that is the case what is the difference between data_driven_schema_configs and basic_configs? I thought that the only difference was that the data_driven_schema_configs comes with the managed schema and the basic_configs come with regular? Also, I haven't really dived into the schema less mode so far, I know elastic uses it and it has been kind of a turn off for me. Can you provide some guidance around best practices on how to use it? For example, now I have all of my configuration files in version control, if I need to make a change, I upload a new schema to version control, then the server pulls them down, uploads to zk and reloads collections. This is almost fully automated and since all configuration is in a single file it is easy to review and track previous changes. I like this process and it works well; if I have to start using managed schemas; I would like some feedback on how to implement it with minimal disruption to this. If I am sending all schema changes via the API, I would need to have still have some file with the schema configuration, it would just be a different format. I would then need to have some code to read it and send specific items to Solr, right? When I need to make a change, do I have to then make this change individually and include that configuration as part of the config file? Or should I be able to just send the entire schema in again? Previously when I tried to upload the entire schema again I ran into problems; for example if there is already field copying from field1 to field 2, when I resend the config it would add another "copy field set". So copying would occur twice and error out if the field is not multi-valued. If future changes need to be made atomically and then included back into this other config it just introduces more room for error. Also, with classic schema if I wanted to revert a change or delete a field, I would simply remove it from the schema and re-upload. Now it looks like I need to add additional functionality into whatever my new process will be to delete fields / copy fields, etc... I know the point of this is to be able to easily make a UI for these changes, but UI changes are hard to automate and version control. Please let me know if I am missing something. On Fri, Mar 11, 2016 at 10:41 AM, Shawn Heiseywrote: > On 3/11/2016 7:01 AM, Nick Vasilyev wrote: > > Is this now the default behavior for basic_configs? I would really like > to > > maintain an option to easily create collection with classic schema > settings > > without jumping through all of these hoops. > > Starting in 5.5, all examples now use the managed schema. > > https://issues.apache.org/jira/browse/SOLR-8131 > > The classic schema factory still exists, and probably will exist for all > 6.x versions, so you will not need to migrate any existing setup yet. > > I don't mind putting more emphasis on the new factory or using it by > default. I expect that eventually the classic factory will get > deprecated. When that happens, I would like to see an option to mimic > the classic version, where making changes via API won't work. One > person has already come into the IRC channel and asked how they can > disable schema editing. > > Although I don't have a problem with the managed schema, I still don't > like schemaless mode, which requires the managed schema. It looks like > the basic_configs and sample_techproducts_configs examples have NOT > enabled that feature. > > Thanks, > Shawn > >
Re: Solr Managed Schema by Default in 5.5
On 3/11/2016 7:01 AM, Nick Vasilyev wrote: > Is this now the default behavior for basic_configs? I would really like to > maintain an option to easily create collection with classic schema settings > without jumping through all of these hoops. Starting in 5.5, all examples now use the managed schema. https://issues.apache.org/jira/browse/SOLR-8131 The classic schema factory still exists, and probably will exist for all 6.x versions, so you will not need to migrate any existing setup yet. I don't mind putting more emphasis on the new factory or using it by default. I expect that eventually the classic factory will get deprecated. When that happens, I would like to see an option to mimic the classic version, where making changes via API won't work. One person has already come into the IRC channel and asked how they can disable schema editing. Although I don't have a problem with the managed schema, I still don't like schemaless mode, which requires the managed schema. It looks like the basic_configs and sample_techproducts_configs examples have NOT enabled that feature. Thanks, Shawn
Solr Managed Schema by Default in 5.5
Hi, I started playing around with Solr 5.5 and created a collection using the following: ./solr create_collection -c test -p 9000 -replicationFactor 2 -d basic_configs -shards 2 The collection created fine, however I see that although I specified basic_configs, it was deployed in managed schema mode. I was able to follow instructions here: https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig To get it back to basic mode, which required me to modify solrconfig and remove the manged schema file from zookeeper manually. I checked the configuration files for basic_configs for Solr 5.5 and it looks like it is managed, however Solr 5.4 still has the classic as the default parameters. Is this now the default behavior for basic_configs? I would really like to maintain an option to easily create collection with classic schema settings without jumping through all of these hoops. Thanks