[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285022#comment-14285022 ] Steve Rowe commented on SOLR-6913: -- I mistakenly removed the {{\*_t}} dynamic field from data_drive_schema_configs's {{managed-schema}}. Currently data_driven_schema_configs has: {code:xml} dynamicField name=*_txt type=text_general indexed=true stored=true multiValued=true/ {code} I'm going to remove {{multiValued=true}} from the {{\*_txt}} declaration (since the {{text_general}} field type already has {{multiValued=true}}, and then make {{\*_t}} be the same as it - the result will be: {code:xml} dynamicField name=*_t type=text_general indexed=true stored=true/ dynamicField name=*_txt type=text_general indexed=true stored=true/ {code} Committing shortly. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285023#comment-14285023 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1653419 from [~sar...@syr.edu] in branch 'dev/trunk' [ https://svn.apache.org/r1653419 ] SOLR-6913: put back mistakenly removed '*_t' dynamic field audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285025#comment-14285025 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1653420 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1653420 ] SOLR-6913: put back mistakenly removed '*_t' dynamic field (merged trunk r1653419) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285026#comment-14285026 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1653421 from [~sar...@syr.edu] in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1653421 ] SOLR-6913: put back mistakenly removed '*_t' dynamic field (merged trunk r1653419) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272182#comment-14272182 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650706 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650706 ] SOLR-6913: In data_driven_schema_configs configset, rename schema.xml to managed-schema, remove example-only fieldtypes, add dynamic fields for each fieldtype where they don't exist, and add a warning about the catch-all _text field (merged trunk r1650701) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272191#comment-14272191 ] Steve Rowe commented on SOLR-6913: -- I reverted my initial commit, then made changes to {{schema.xml}}, putting back most field type and dynamic fields I had removed, added dynamic fields for each field type when they weren't there, added a warning about the catch-all {{_text}} field to the schema, then renamed {{schema.xml}} to {{managed-schema}}. This keeps the comments-as-documentation intact in the configset, where they won't be overwritten. Also, the schema will be much easier to maintain, and track history for. I think this is done. (Should have reopened and then resolved again - too late now...) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272153#comment-14272153 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650701 from [~sar...@syr.edu] in branch 'dev/trunk' [ https://svn.apache.org/r1650701 ] SOLR-6913: In data_driven_schema_configs configset, rename schema.xml to managed-schema, remove example-only fieldtypes, add dynamic fields for each fieldtype where they don't exist, and add a warning about the catch-all _text field audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272157#comment-14272157 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650702 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650702 ] SOLR-6913: revert cleanup schema in data_drive_schema_configs configset (schema modifications will follow) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272063#comment-14272063 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650696 from [~sar...@syr.edu] in branch 'dev/trunk' [ https://svn.apache.org/r1650696 ] SOLR-6913: revert cleanup schema in data_drive_schema_configs configset (schema modifications will follow) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271487#comment-14271487 ] Grant Ingersoll commented on SOLR-6913: --- bq. My thinking was that the schemaless example should be minimal. In particular, if we don't have a way for field types to be used (via (dynamic)field definitions or field guessing), why include them? If the user can add fields, they can add field types too. The main issue is that OOTB, this is the default and it thus leaves us pretty underpowered for an OOTB experience. Those Field Types have been in Solr for a long time and I think they hold up reasonably well, so I would vote for putting them back in. I think the big difference is, Solr experts come at the situation from edit schema/config first. New users come at data stores as let me manipulate my data first and then harden it later. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271198#comment-14271198 ] Steve Rowe commented on SOLR-6913: -- bq. What's the reasoning behind removing so many of the field types? My thinking was that the schemaless example should be minimal. In particular, if we don't have a way for field types to be used (via (dynamic)field definitions or field guessing), why include them? If the user can add fields, they can add field types too. {quote} I'd vote for returning: # geo related # currency # Language support {quote} In the cases of language support, there was no way to use those field types without manually adding fields (there were no dynamic fields defined for them), and as it stands we don't have a way to document the schema so that people can figure out what field types to use (though see my schema annotation proposal: [http://mail-archives.apache.org/mod_mbox/lucene-dev/201308.mbox/%3c7384f7f2-ad35-480b-8523-3db75aa06...@gmail.com%3E]). There were geo dynamic field to go with the defined field types, but I removed them because understanding which geo type to use seemed confusing, and solr spatial is evolving, so it seemed better to let the user find the latest advice for how to use this and update the schema themselves. I removed the currency capabilities because it seemed esoteric, and didn't fit with a simple example. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271499#comment-14271499 ] Grant Ingersoll commented on SOLR-6913: --- IOW, it's not about schemaless, it's about schema-later audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271547#comment-14271547 ] Steve Rowe commented on SOLR-6913: -- bq. The main issue is that OOTB, this is the default and it thus leaves us pretty underpowered for an OOTB experience. Okay, I'll buy it: since {{data_driven_schema_configs}} is the default configset when creating a core or a collection from {{bin/solr}}, broad field type and dynamic field support is called for. In addition to putting back the geo related and currency dynamic fields and field types, I'll put back the lang-specific field types, and add (previously missing) dynamic fields for them. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271806#comment-14271806 ] Grant Ingersoll commented on SOLR-6913: --- Awesome, thanks Steve! audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270989#comment-14270989 ] Grant Ingersoll commented on SOLR-6913: --- I'd vote for returning: # geo related # currency # Language support Indifferent on the rest. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270978#comment-14270978 ] Grant Ingersoll commented on SOLR-6913: --- What's the reasoning behind removing so many of the field types? audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270981#comment-14270981 ] Grant Ingersoll commented on SOLR-6913: --- I think the regular workflow for exploring new datasets is to just start throwing it at Solr and then to tweak the data, not tweak the schema. Data first, schema second. So, for instance, I'm working on this citibike data. My first step is to index it w/ no schema whatsoever. I then iterate by writing a little python to index some of the columns as spatial. What I don't do is go muck w/ the schema, hence the name data-driven. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0, Trunk Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269564#comment-14269564 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650330 from [~sar...@syr.edu] in branch 'dev/trunk' [ https://svn.apache.org/r1650330 ] SOLR-6913: eol-style for managed-schema audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269583#comment-14269583 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650336 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650336 ] SOLR-6913: cleanup schema in data_drive_schema_configs configset (merged trunk r1650329 and r1650330) audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269555#comment-14269555 ] ASF subversion and git services commented on SOLR-6913: --- Commit 1650329 from [~sar...@syr.edu] in branch 'dev/trunk' [ https://svn.apache.org/r1650329 ] SOLR-6913: cleanup schema in data_drive_schema_configs configset audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch, SOLR-6913-trim-schema.patch, SOLR-6913.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266054#comment-14266054 ] Erik Hatcher commented on SOLR-6913: That patch breaks the data_driven_schema_configs because the plural field types are still specified for AddSchemaFieldsUpdateProcessorFactory in solrconfig: {code} $ bin/solr create_core -n films -c data_driven_schema_configs Creating new core 'films' using command: http://localhost:8983/solr/admin/cores?action=CREATEname=filmsinstanceDir=films Failed to create core 'films' due to: Error CREATEing SolrCore 'films': Unable to create core [films] Caused by: fieldType 'booleans' not found in the schema {code} I thought the plural field types were awkward, but they do allow multi-valued content to come in easily. What happens with these schema changes when multivalued content comes in the first time? Does this require fields be configured prior to data ingestion? One test for the changes here is to follow the steps in example/films/README.txt audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266163#comment-14266163 ] Alexandre Rafalovitch commented on SOLR-6913: - All fields have to be multiValued or there needs to be a way to _upgrade_ a field definition to multiValued. The last (single-multiValued upgrade) is what ElasticSearch does behind the scenes. Or more like always being multivalued, but return content as value or as array depending on how many items are there. Of course that ambiguity may then break the clients if they expect a single value and get an array back. The other option is to introduce dry-run URP that looks at all the values before creating the type. As per the side-discussion in [SOLR-6016|https://issues.apache.org/jira/browse/SOLR-6016?focusedCommentId=14060934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14060934]. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266114#comment-14266114 ] Erik Hatcher commented on SOLR-6913: And changing all those plural field types mentioned in solrconfig to singular, this is what happens on {{bin/post films example/films/films.json}}: {code} LucidErikMBP:solr erikhatcher$ bin/post films example/films/films.json INFO - 2015-01-06 13:45:08.001; org.apache.solr.schema.ManagedIndexSchema; Upgraded to managed schema at /Users/erikhatcher/dev/trunk/solr/server/solr/films/conf/managed-schema INFO - 2015-01-06 13:45:08.041; org.apache.solr.update.processor.LogUpdateProcessor; [films] webapp=/solr path=/update params={} {add=[/en/001 (1489556637456793600)]} 0 75 ERROR - 2015-01-06 13:45:08.042; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: ERROR: [doc=/en/45_2006] multiple values encountered for non multiValued field genre: [Black comedy, Thriller, Psychological thriller, Indie film, Action Film, Crime Thriller, Crime Fiction, Drama] {code} :/ - now what? We need the auto-field guessing to also guess and set multiValued, seems like. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266547#comment-14266547 ] Erik Hatcher commented on SOLR-6913: bq. We need the auto-field guessing to also guess and set multiValued, seems like. I gave this a whirl (patch below as code comment for posterity) and did not like it. I did not like it because the data I tried always had one field that comes in as a single value (or even a single valued array in, say, JSON; that's indistinguishable from a single value at this update processor level it seems) in the first, or even more confusing after a handful of documents go in successfully, then multiple values start coming in. A prime example of where guessing this stuff is, more often than not, incorrect or inappropriate (at least on a single field value sample size) somewhere along the way with real data. It's easiest, I'll echo, to just assume multivalued on new fields. No worries, this is why it's now been made easy to nudge these things with something as straightforward as this when setting things up: {code} curl http://localhost:8983/solr/films/schema/fields -X POST -H 'Content-type:application/json' --data-binary ' [ { name:name, type:text_general, stored:true } ]' {code} {code} Index: core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java === --- core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java (revision 1649842) +++ core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java (working copy) @@ -39,8 +39,10 @@ import java.util.ArrayList; import java.util.Collection; import java.util.Collections; +import java.util.HashMap; import java.util.HashSet; import java.util.List; +import java.util.Map; import java.util.Set; import static org.apache.solr.common.SolrException.ErrorCode.BAD_REQUEST; @@ -279,8 +281,16 @@ FieldNameSelector selector = buildSelector(oldSchema); for (final String fieldName : doc.getFieldNames()) { if (selector.shouldMutate(fieldName)) { // returns false if the field already exists in the current schema -String fieldTypeName = mapValueClassesToFieldType(doc.getField(fieldName)); -newFields.add(oldSchema.newField(fieldName, fieldTypeName, Collections.String,ObjectemptyMap())); +SolrInputField value = doc.getField(fieldName); +String fieldTypeName = mapValueClassesToFieldType(value); +MapString,Object options = new HashMap(); + +if (value.getValueCount() 1) { + options.put(multiValued, true); +} + +SchemaField newField = oldSchema.newField(fieldName, fieldTypeName, options); +newFields.add(newField); } } if (newFields.isEmpty()) { {code} audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 Attachments: SOLR-6913-trim-schema.patch the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To
[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs
[ https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265600#comment-14265600 ] Steve Rowe commented on SOLR-6913: -- bq. managed-schema file comments I want to strip all (non-license) comments from the shipped {{managed-schema}} file - I'll start up the example and get it to do the auto-bootstrap thing (removes comments in the process of serialization), then add back the license comment, then {{svn rm schema.xml}} and {{svn add managed-schema}}. audit cleanup schema in data_driven_schema_configs -- Key: SOLR-6913 URL: https://issues.apache.org/jira/browse/SOLR-6913 Project: Solr Issue Type: Task Reporter: Hoss Man Assignee: Steve Rowe Priority: Blocker Fix For: 5.0 the data_driven_schema_configs configset has some issues that should be reviewed carefully cleaned up... * currentkly includes a schema.xml file: ** this was previously pat of the old example to show the automatic bootstraping of schema.xml - managed-schema, but at this point it's just kind of confusing ** we should just rename this to managed-schema in svn - the ref guide explains the bootstraping * the effective schema as it currently stands includes a bunch of copyFields dynamicFields that are taken wholesale from the techproducts example ** some of these might make sense to keep in a general example (ie: \*_txt) but in general they should all be reviewed. ** a bunch of this cruft is actually commented out already, but anything we don't want to keep should be removed to eliminate confusion * SOLR-6471 added an explicit _text field as the default and made it a copyField catchall (ie: \*) ** the ref guide schema API example responses need to reflect the existence of this field: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ** we should draw heavy attention to this field+copyField -- both with a /!\ NOTE in the refguide and call it out in solrconfig.xml managed-schema file comments since people who start with these configs may be suprised and wind up with a very bloated index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org