[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-20 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285022#comment-14285022
 ] 

Steve Rowe commented on SOLR-6913:
--

I mistakenly removed the {{\*_t}} dynamic field from 
data_drive_schema_configs's {{managed-schema}}.

Currently data_driven_schema_configs has:

{code:xml}
dynamicField name=*_txt type=text_general indexed=true stored=true 
multiValued=true/
{code}

I'm going to remove {{multiValued=true}} from the {{\*_txt}} declaration 
(since the {{text_general}} field type already has {{multiValued=true}}, and 
then make {{\*_t}} be the same as it - the result will be:

{code:xml}
dynamicField name=*_t   type=text_general indexed=true 
stored=true/
dynamicField name=*_txt type=text_general indexed=true 
stored=true/
{code}

Committing shortly.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285023#comment-14285023
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1653419 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1653419 ]

SOLR-6913: put back mistakenly removed '*_t' dynamic field

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285025#comment-14285025
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1653420 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1653420 ]

SOLR-6913: put back mistakenly removed '*_t' dynamic field (merged trunk 
r1653419)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285026#comment-14285026
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1653421 from [~sar...@syr.edu] in branch 'dev/branches/lucene_solr_5_0'
[ https://svn.apache.org/r1653421 ]

SOLR-6913: put back mistakenly removed '*_t' dynamic field (merged trunk 
r1653419)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272182#comment-14272182
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650706 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1650706 ]

SOLR-6913: In data_driven_schema_configs configset, rename schema.xml to 
managed-schema, remove example-only fieldtypes, add dynamic fields for each 
fieldtype where they don't exist, and add a warning about the catch-all _text 
field (merged trunk r1650701)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272191#comment-14272191
 ] 

Steve Rowe commented on SOLR-6913:
--

I reverted my initial commit, then made changes to {{schema.xml}}, putting back 
most field type and dynamic fields I had removed, added dynamic fields for each 
field type when they weren't there, added a warning about the catch-all 
{{_text}} field to the schema, then renamed {{schema.xml}} to 
{{managed-schema}}.   This keeps the comments-as-documentation intact in the 
configset, where they won't be overwritten.  Also, the schema will be much 
easier to maintain, and track history for.

I think this is done. (Should have reopened and then resolved again - too late 
now...)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272153#comment-14272153
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650701 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1650701 ]

SOLR-6913: In data_driven_schema_configs configset, rename schema.xml to 
managed-schema, remove example-only fieldtypes, add dynamic fields for each 
fieldtype where they don't exist, and add a warning about the catch-all _text 
field

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272157#comment-14272157
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650702 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1650702 ]

SOLR-6913: revert cleanup schema in data_drive_schema_configs configset (schema 
modifications will follow)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272063#comment-14272063
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650696 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1650696 ]

SOLR-6913: revert cleanup schema in data_drive_schema_configs configset (schema 
modifications will follow)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271487#comment-14271487
 ] 

Grant Ingersoll commented on SOLR-6913:
---

bq. My thinking was that the schemaless example should be minimal. In 
particular, if we don't have a way for field types to be used (via 
(dynamic)field definitions or field guessing), why include them? If the user 
can add fields, they can add field types too.

The main issue is that OOTB, this is the default and it thus leaves us pretty 
underpowered for an OOTB experience.  Those Field Types have been in Solr for a 
long time and I think they hold up reasonably well, so I would vote for putting 
them back in.

I think the big difference is, Solr experts come at the situation from edit 
schema/config first.  New users come at data stores as let me manipulate my 
data first and then harden it later.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271198#comment-14271198
 ] 

Steve Rowe commented on SOLR-6913:
--

bq. What's the reasoning behind removing so many of the field types?

My thinking was that the schemaless example should be minimal.  In particular, 
if we don't have a way for field types to be used (via (dynamic)field 
definitions or field guessing), why include them?  If the user can add fields, 
they can add field types too.

{quote}
I'd vote for returning:
# geo related
# currency
# Language support
{quote}

In the cases of language support, there was no way to use those field types 
without manually adding fields (there were no dynamic fields defined for them), 
and as it stands we don't have a way to document the schema so that people can 
figure out what field types to use (though see my schema annotation proposal: 
[http://mail-archives.apache.org/mod_mbox/lucene-dev/201308.mbox/%3c7384f7f2-ad35-480b-8523-3db75aa06...@gmail.com%3E]).

There were geo dynamic field to go with the defined field types, but I removed 
them because understanding which geo type to use seemed confusing, and solr 
spatial is evolving, so it seemed better to let the user find the latest advice 
for how to use this and update the schema themselves.

I removed the currency capabilities because it seemed esoteric, and didn't fit 
with a simple example.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271499#comment-14271499
 ] 

Grant Ingersoll commented on SOLR-6913:
---

IOW, it's not about schemaless, it's about schema-later

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271547#comment-14271547
 ] 

Steve Rowe commented on SOLR-6913:
--

bq. The main issue is that OOTB, this is the default and it thus leaves us 
pretty underpowered for an OOTB experience. 

Okay, I'll buy it: since {{data_driven_schema_configs}} is the default 
configset when creating a core or a collection from {{bin/solr}}, broad field 
type and dynamic field support is called for.

In addition to putting back the geo related and currency dynamic fields and 
field types, I'll put back the lang-specific field types, and add (previously 
missing) dynamic fields for them.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271806#comment-14271806
 ] 

Grant Ingersoll commented on SOLR-6913:
---

Awesome, thanks Steve!

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270989#comment-14270989
 ] 

Grant Ingersoll commented on SOLR-6913:
---

I'd vote for returning:

# geo related
# currency
# Language support

Indifferent on the rest.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270978#comment-14270978
 ] 

Grant Ingersoll commented on SOLR-6913:
---

What's the reasoning behind removing so many of the field types?  

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270981#comment-14270981
 ] 

Grant Ingersoll commented on SOLR-6913:
---

I think the regular workflow for exploring new datasets is to just start 
throwing it at Solr and then to tweak the data, not tweak the schema.  Data 
first, schema second.  So, for instance, I'm working on this citibike data.  My 
first step is to index it w/ no schema whatsoever.  I then iterate by writing a 
little python to index some of the columns as spatial.  What I don't do is go 
muck w/ the schema, hence the name data-driven.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269564#comment-14269564
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650330 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1650330 ]

SOLR-6913: eol-style for managed-schema

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269583#comment-14269583
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650336 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1650336 ]

SOLR-6913: cleanup schema in data_drive_schema_configs configset (merged trunk 
r1650329 and r1650330)

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269555#comment-14269555
 ] 

ASF subversion and git services commented on SOLR-6913:
---

Commit 1650329 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1650329 ]

SOLR-6913: cleanup schema in data_drive_schema_configs configset

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch, 
 SOLR-6913-trim-schema.patch, SOLR-6913.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-06 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266054#comment-14266054
 ] 

Erik Hatcher commented on SOLR-6913:


That patch breaks the data_driven_schema_configs because the plural field types 
are still specified for AddSchemaFieldsUpdateProcessorFactory in solrconfig:

{code}
$ bin/solr create_core -n films -c data_driven_schema_configs

Creating new core 'films' using command:
http://localhost:8983/solr/admin/cores?action=CREATEname=filmsinstanceDir=films

Failed to create core 'films' due to: Error CREATEing SolrCore 'films': Unable 
to create core [films] Caused by: fieldType 'booleans' not found in the schema
{code}

I thought the plural field types were awkward, but they do allow multi-valued 
content to come in easily.  What happens with these schema changes when 
multivalued content comes in the first time?   Does this require fields be 
configured prior to data ingestion?

One test for the changes here is to follow the steps in example/films/README.txt

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-06 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266163#comment-14266163
 ] 

Alexandre Rafalovitch commented on SOLR-6913:
-

All fields have to be multiValued or there needs to be a way to _upgrade_ a 
field definition to multiValued. 

The last (single-multiValued upgrade) is what ElasticSearch does behind the 
scenes. Or more like always being multivalued, but return content as value or 
as array depending on how many items are there. Of course that ambiguity may 
then break the clients if they expect a single value and get an array back.

The other option is to introduce dry-run URP that looks at all the values 
before creating the type. As per the side-discussion in 
[SOLR-6016|https://issues.apache.org/jira/browse/SOLR-6016?focusedCommentId=14060934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14060934].

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-06 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266114#comment-14266114
 ] 

Erik Hatcher commented on SOLR-6913:


And changing all those plural field types mentioned in solrconfig to singular, 
this is what happens on {{bin/post films example/films/films.json}}:

{code}
LucidErikMBP:solr erikhatcher$ bin/post films example/films/films.json 

INFO  - 2015-01-06 13:45:08.001; org.apache.solr.schema.ManagedIndexSchema; 
Upgraded to managed schema at 
/Users/erikhatcher/dev/trunk/solr/server/solr/films/conf/managed-schema
INFO  - 2015-01-06 13:45:08.041; 
org.apache.solr.update.processor.LogUpdateProcessor; [films] webapp=/solr 
path=/update params={} {add=[/en/001 (1489556637456793600)]} 0 75
ERROR - 2015-01-06 13:45:08.042; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: ERROR: [doc=/en/45_2006] multiple values 
encountered for non multiValued field genre: [Black comedy, Thriller, 
Psychological thriller, Indie film, Action Film, Crime Thriller, Crime Fiction, 
Drama]
{code}

:/ - now what?  We need the auto-field guessing to also guess and set 
multiValued, seems like.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-06 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266547#comment-14266547
 ] 

Erik Hatcher commented on SOLR-6913:


bq. We need the auto-field guessing to also guess and set multiValued, seems 
like.

I gave this a whirl (patch below as code comment for posterity) and did not 
like it.  I did not like it because the data I tried always had one field that 
comes in as a single value (or even a single valued array in, say, JSON; that's 
indistinguishable from a single value at this update processor level it seems) 
in the first, or even more confusing after a handful of documents go in 
successfully, then multiple values start coming in.  A prime example of where 
guessing this stuff is, more often than not, incorrect or inappropriate (at 
least on a single field value sample size) somewhere along the way with real 
data.  It's easiest, I'll echo, to just assume multivalued on new fields.  No 
worries, this is why it's now been made easy to nudge these things with 
something as straightforward as this when setting things up:

{code}
curl http://localhost:8983/solr/films/schema/fields -X POST -H 
'Content-type:application/json' --data-binary '
[
{
name:name,
type:text_general,
stored:true
}
]'
{code}

{code}
Index: 
core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java
===
--- 
core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java
   (revision 1649842)
+++ 
core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java
   (working copy)
@@ -39,8 +39,10 @@
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 
 import static org.apache.solr.common.SolrException.ErrorCode.BAD_REQUEST;
@@ -279,8 +281,16 @@
 FieldNameSelector selector = buildSelector(oldSchema);
 for (final String fieldName : doc.getFieldNames()) {
   if (selector.shouldMutate(fieldName)) { // returns false if the 
field already exists in the current schema
-String fieldTypeName = 
mapValueClassesToFieldType(doc.getField(fieldName));
-newFields.add(oldSchema.newField(fieldName, fieldTypeName, 
Collections.String,ObjectemptyMap()));
+SolrInputField value = doc.getField(fieldName);
+String fieldTypeName = mapValueClassesToFieldType(value);
+MapString,Object options = new HashMap();
+
+if (value.getValueCount()  1) {
+  options.put(multiValued, true);
+}
+
+SchemaField newField = oldSchema.newField(fieldName, 
fieldTypeName, options);
+newFields.add(newField);
   }
 }
 if (newFields.isEmpty()) {
{code}

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-6913-trim-schema.patch


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Commented] (SOLR-6913) audit cleanup schema in data_driven_schema_configs

2015-01-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265600#comment-14265600
 ] 

Steve Rowe commented on SOLR-6913:
--

bq. managed-schema file comments

I want to strip all (non-license) comments from the shipped {{managed-schema}} 
file - I'll start up the example and get it to do the auto-bootstrap thing 
(removes comments in the process of serialization), then add back the license 
comment, then {{svn rm schema.xml}} and {{svn add managed-schema}}.

 audit  cleanup schema in data_driven_schema_configs
 --

 Key: SOLR-6913
 URL: https://issues.apache.org/jira/browse/SOLR-6913
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 5.0


 the data_driven_schema_configs configset has some issues that should be 
 reviewed carefully  cleaned up...
 * currentkly includes a schema.xml file:
 ** this was previously pat of the old example to show the automatic 
 bootstraping of schema.xml - managed-schema, but at this point it's just 
 kind of confusing
 ** we should just rename this to managed-schema in svn - the ref guide 
 explains the bootstraping
 * the effective schema as it currently stands includes a bunch of copyFields 
  dynamicFields that are taken wholesale from the techproducts example
 ** some of these might make sense to keep in a general example (ie: \*_txt) 
 but in general they should all be reviewed.
 ** a bunch of this cruft is actually commented out already, but anything we 
 don't want to keep should be removed to eliminate confusion
 * SOLR-6471 added an explicit _text field as the default and made it a 
 copyField catchall (ie: \*)
 ** the ref guide schema API example responses need to reflect the existence 
 of this field: 
 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
 ** we should draw heavy attention to this field+copyField -- both with a /!\ 
 NOTE in the refguide and call it out in solrconfig.xml  managed-schema 
 file comments since people who start with these configs may be suprised and 
 wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org