Re: Solr Managed Schema by Default in 5.5

2016-03-11 Thread Shalin Shekhar Mangar
Data driven mode is different from managed schema. It is unfortunate
that in our example configurations we implemented them together.

Managed schema is about using APIs to read/write schema changes. Not
requiring people to hand edit schema.xml is a good thing, IMO.

Data driven schema uses the managed schema infrastructure internally
and adds update request processors to create/modify schema depending
on what data you throw at Solr. It is a nice mode to play around with
Solr but I would only use it for PoCs.

I hope that clarifies things.

On Fri, Mar 11, 2016 at 10:36 PM, Nick Vasilyev
 wrote:
> Got it.
>
> Thank you for clarifying this, I was under impression that I would only be
> able to make changes via the API. I will look into this some more.
>
> On Fri, Mar 11, 2016 at 11:51 AM, Shawn Heisey  wrote:
>
>> On 3/11/2016 9:28 AM, Nick Vasilyev wrote:
>> > Maybe I am missing something, if that is the case what is the difference
>> > between data_driven_schema_configs and basic_configs? I thought that the
>> > only difference was that the data_driven_schema_configs comes with the
>> > managed schema and the basic_configs come with regular?
>> >
>> > Also, I haven't really dived into the schema less mode so far, I know
>> > elastic uses it and it has been kind of a turn off for me. Can you
>> provide
>> > some guidance around best practices on how to use it?
>>
>> Schemaless mode is implemented with an update processor chain.  If you
>> look in the data_driven_schema_configs solrconfig.xml file, you will
>> find an updateRequestProcessorChain named
>> "add-unknown-fields-to-the-schema".  This update chain is then enabled
>> with an initParams config.
>>
>> I personally would not recommend using it.  It would be fine to use
>> during prototyping, but I would definitely turn it off for production.
>>
>> > For example, now I have all of my configuration files in version control,
>> > if I need to make a change, I upload a new schema to version control,
>> then
>> > the server pulls them down, uploads to zk and reloads collections. This
>> is
>> > almost fully automated and since all configuration is in a single file it
>> > is easy to review and track previous changes. I like this process and it
>> > works well; if I have to start using managed schemas; I would like some
>> > feedback on how to implement it with minimal disruption to this.
>>
>> There's no reason you can't continue to use this method, even with the
>> managed schema.  Editing the managed-schema is discouraged if you
>> actually intend to use the Schema API, but there's nothing in place to
>> prevent you from doing it that way.
>>
>> > If I am sending all schema changes via the API, I would need to have
>> still
>> > have some file with the schema configuration, it would just be a
>> different
>> > format. I would then need to have some code to read it and send specific
>> > items to Solr, right?  When I need to make a change, do I have to then
>> make
>> > this change individually and include that configuration as part of the
>> > config file? Or should I be able to just send the entire schema in again?
>>
>> Using the Schema API changes the managed-schema file in place.  You
>> wouldn't need to upload anything to zookeeper, the change would already
>> be there -- but you'd have to take an extra step (retrieving from
>> zookeeper) to make sure it's in version control.
>>
>> My recommendation is to just keep using version control as you have
>> been, which you can do with either the Classic or Managed schema.  The
>> filename for the schema would change with the managed version, but
>> nothing else.
>>
>> Thanks,
>> Shawn
>>
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Managed Schema by Default in 5.5

2016-03-11 Thread Nick Vasilyev
Got it.

Thank you for clarifying this, I was under impression that I would only be
able to make changes via the API. I will look into this some more.

On Fri, Mar 11, 2016 at 11:51 AM, Shawn Heisey  wrote:

> On 3/11/2016 9:28 AM, Nick Vasilyev wrote:
> > Maybe I am missing something, if that is the case what is the difference
> > between data_driven_schema_configs and basic_configs? I thought that the
> > only difference was that the data_driven_schema_configs comes with the
> > managed schema and the basic_configs come with regular?
> >
> > Also, I haven't really dived into the schema less mode so far, I know
> > elastic uses it and it has been kind of a turn off for me. Can you
> provide
> > some guidance around best practices on how to use it?
>
> Schemaless mode is implemented with an update processor chain.  If you
> look in the data_driven_schema_configs solrconfig.xml file, you will
> find an updateRequestProcessorChain named
> "add-unknown-fields-to-the-schema".  This update chain is then enabled
> with an initParams config.
>
> I personally would not recommend using it.  It would be fine to use
> during prototyping, but I would definitely turn it off for production.
>
> > For example, now I have all of my configuration files in version control,
> > if I need to make a change, I upload a new schema to version control,
> then
> > the server pulls them down, uploads to zk and reloads collections. This
> is
> > almost fully automated and since all configuration is in a single file it
> > is easy to review and track previous changes. I like this process and it
> > works well; if I have to start using managed schemas; I would like some
> > feedback on how to implement it with minimal disruption to this.
>
> There's no reason you can't continue to use this method, even with the
> managed schema.  Editing the managed-schema is discouraged if you
> actually intend to use the Schema API, but there's nothing in place to
> prevent you from doing it that way.
>
> > If I am sending all schema changes via the API, I would need to have
> still
> > have some file with the schema configuration, it would just be a
> different
> > format. I would then need to have some code to read it and send specific
> > items to Solr, right?  When I need to make a change, do I have to then
> make
> > this change individually and include that configuration as part of the
> > config file? Or should I be able to just send the entire schema in again?
>
> Using the Schema API changes the managed-schema file in place.  You
> wouldn't need to upload anything to zookeeper, the change would already
> be there -- but you'd have to take an extra step (retrieving from
> zookeeper) to make sure it's in version control.
>
> My recommendation is to just keep using version control as you have
> been, which you can do with either the Classic or Managed schema.  The
> filename for the schema would change with the managed version, but
> nothing else.
>
> Thanks,
> Shawn
>
>


Re: Solr Managed Schema by Default in 5.5

2016-03-11 Thread Shawn Heisey
On 3/11/2016 9:28 AM, Nick Vasilyev wrote:
> Maybe I am missing something, if that is the case what is the difference
> between data_driven_schema_configs and basic_configs? I thought that the
> only difference was that the data_driven_schema_configs comes with the
> managed schema and the basic_configs come with regular?
>
> Also, I haven't really dived into the schema less mode so far, I know
> elastic uses it and it has been kind of a turn off for me. Can you provide
> some guidance around best practices on how to use it?

Schemaless mode is implemented with an update processor chain.  If you
look in the data_driven_schema_configs solrconfig.xml file, you will
find an updateRequestProcessorChain named
"add-unknown-fields-to-the-schema".  This update chain is then enabled
with an initParams config.

I personally would not recommend using it.  It would be fine to use
during prototyping, but I would definitely turn it off for production.

> For example, now I have all of my configuration files in version control,
> if I need to make a change, I upload a new schema to version control, then
> the server pulls them down, uploads to zk and reloads collections. This is
> almost fully automated and since all configuration is in a single file it
> is easy to review and track previous changes. I like this process and it
> works well; if I have to start using managed schemas; I would like some
> feedback on how to implement it with minimal disruption to this.

There's no reason you can't continue to use this method, even with the
managed schema.  Editing the managed-schema is discouraged if you
actually intend to use the Schema API, but there's nothing in place to
prevent you from doing it that way.

> If I am sending all schema changes via the API, I would need to have still
> have some file with the schema configuration, it would just be a different
> format. I would then need to have some code to read it and send specific
> items to Solr, right?  When I need to make a change, do I have to then make
> this change individually and include that configuration as part of the
> config file? Or should I be able to just send the entire schema in again?

Using the Schema API changes the managed-schema file in place.  You
wouldn't need to upload anything to zookeeper, the change would already
be there -- but you'd have to take an extra step (retrieving from
zookeeper) to make sure it's in version control.

My recommendation is to just keep using version control as you have
been, which you can do with either the Classic or Managed schema.  The
filename for the schema would change with the managed version, but
nothing else.

Thanks,
Shawn



Re: Solr Managed Schema by Default in 5.5

2016-03-11 Thread Nick Vasilyev
Hi Shawn,

Maybe I am missing something, if that is the case what is the difference
between data_driven_schema_configs and basic_configs? I thought that the
only difference was that the data_driven_schema_configs comes with the
managed schema and the basic_configs come with regular?

Also, I haven't really dived into the schema less mode so far, I know
elastic uses it and it has been kind of a turn off for me. Can you provide
some guidance around best practices on how to use it?

For example, now I have all of my configuration files in version control,
if I need to make a change, I upload a new schema to version control, then
the server pulls them down, uploads to zk and reloads collections. This is
almost fully automated and since all configuration is in a single file it
is easy to review and track previous changes. I like this process and it
works well; if I have to start using managed schemas; I would like some
feedback on how to implement it with minimal disruption to this.

If I am sending all schema changes via the API, I would need to have still
have some file with the schema configuration, it would just be a different
format. I would then need to have some code to read it and send specific
items to Solr, right?  When I need to make a change, do I have to then make
this change individually and include that configuration as part of the
config file? Or should I be able to just send the entire schema in again?

Previously when I tried to upload the entire schema again I ran into
problems; for example if there is already field copying from field1 to
field 2, when I resend the config it would add another "copy field set". So
copying would occur twice and error out if the field is not multi-valued.
If future changes need to be made atomically and then included back into
this other config it just introduces more room for error.

Also, with classic schema if I wanted to revert a change or delete a field,
I would simply remove it from the schema and re-upload. Now it looks like I
need to add additional functionality into whatever my new process will be
to delete fields / copy fields, etc...

I know the point of this is to be able to easily make a UI for these
changes, but UI changes are hard to automate and version control. Please
let me know if I am missing something.

On Fri, Mar 11, 2016 at 10:41 AM, Shawn Heisey  wrote:

> On 3/11/2016 7:01 AM, Nick Vasilyev wrote:
> > Is this now the default behavior for basic_configs? I would really like
> to
> > maintain an option to easily create collection with classic schema
> settings
> > without jumping through all of these hoops.
>
> Starting in 5.5, all examples now use the managed schema.
>
> https://issues.apache.org/jira/browse/SOLR-8131
>
> The classic schema factory still exists, and probably will exist for all
> 6.x versions, so you will not need to migrate any existing setup yet.
>
> I don't mind putting more emphasis on the new factory or using it by
> default.  I expect that eventually the classic factory will get
> deprecated.  When that happens, I would like to see an option to mimic
> the classic version, where making changes via API won't work.  One
> person has already come into the IRC channel and asked how they can
> disable schema editing.
>
> Although I don't have a problem with the managed schema, I still don't
> like schemaless mode, which requires the managed schema.  It looks like
> the basic_configs and sample_techproducts_configs examples have NOT
> enabled that feature.
>
> Thanks,
> Shawn
>
>


Re: Solr Managed Schema by Default in 5.5

2016-03-11 Thread Shawn Heisey
On 3/11/2016 7:01 AM, Nick Vasilyev wrote:
> Is this now the default behavior for basic_configs? I would really like to
> maintain an option to easily create collection with classic schema settings
> without jumping through all of these hoops.

Starting in 5.5, all examples now use the managed schema.

https://issues.apache.org/jira/browse/SOLR-8131

The classic schema factory still exists, and probably will exist for all
6.x versions, so you will not need to migrate any existing setup yet.

I don't mind putting more emphasis on the new factory or using it by
default.  I expect that eventually the classic factory will get
deprecated.  When that happens, I would like to see an option to mimic
the classic version, where making changes via API won't work.  One
person has already come into the IRC channel and asked how they can
disable schema editing.

Although I don't have a problem with the managed schema, I still don't
like schemaless mode, which requires the managed schema.  It looks like
the basic_configs and sample_techproducts_configs examples have NOT
enabled that feature.

Thanks,
Shawn



Solr Managed Schema by Default in 5.5

2016-03-11 Thread Nick Vasilyev
Hi,

I started playing around with Solr 5.5 and created a collection using the
following:

./solr create_collection -c test -p 9000 -replicationFactor 2 -d
basic_configs -shards 2

The collection created fine, however I see that although I specified
basic_configs, it was deployed in managed schema mode.

I was able to follow instructions here:
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

To get it back to basic mode, which required me to modify solrconfig and
remove the manged schema file from zookeeper manually.

I checked the configuration files for basic_configs for Solr 5.5 and it
looks like it is managed, however Solr 5.4 still has the classic as the
default parameters.

Is this now the default behavior for basic_configs? I would really like to
maintain an option to easily create collection with classic schema settings
without jumping through all of these hoops.

Thanks