[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959067#comment-15959067 ] Shawn Heisey commented on SOLR-8110: [~jens.fosh...@acando.no], Don't put dollars signs (or any other character that tends to have special meaning) in field names. The reason that we plan to enforce a limited character set (which would definitely exclude the dollar sign) is that we cannot guarantee that all future functionality will continue to support those characters in field names, even if *current* functionality does. General consensus is that the period should be allowed in field names, so your dynamic fields will likely be acceptable. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958581#comment-15958581 ] Jens Foshaug commented on SOLR-8110: Any new status on this issue? My hope is to be able to have a naming convention like xxx$fieldname xxx$fieldname.id xxx$fieldname.label or dynamic fields like : *.id *.label > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793258#comment-15793258 ] Erick Erickson commented on SOLR-8110: -- [~hossman_luc...@fucit.org] Hoss: WDYT about putting this in trunk? > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224104#comment-15224104 ] Henrik commented on SOLR-8110: -- Thanks! I'll have to wait some more before upgrading, though, because I just stumbled onto SOLR-8940 . > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216499#comment-15216499 ] Shawn Heisey commented on SOLR-8110: bq. Is it possible to enable support for dashes with a JVM arg (or something else)? No. Only the patch for SOLR-8725 can make it work. I'm going to look into patching the 5.5 branch so it will be in 5.5.1 if that version is ever released. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215678#comment-15215678 ] Henrik commented on SOLR-8110: -- Upon upgrading from 5.4.0 to 5.5.0 I got this: project-lms_shard5_replica2: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Invalid name: 'project-lms_shard5_replica2' Identifiers must consist entirely of periods, underscores and alphanumerics Is it possible to enable support for dashes with a JVM arg (or something else)? Or is there a simple way of renaming a whole lot of solrcloud instances? We have about 20 collections across 30 servers that are affected by this. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214992#comment-15214992 ] Yago Riveiro commented on SOLR-8110: My bad. The issue was pointed in the IRC as the actual place of discussion about name enforcing This issue is about the schema fields and not the one that enforce collection name. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214974#comment-15214974 ] Jason Gerlowski commented on SOLR-8110: --- Maybe I'm missing something, but how would a collection-rename API help with enforcing-field-recommendations? > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214325#comment-15214325 ] Yago Riveiro commented on SOLR-8110: This enforcing shouldn't happen without an API to rename collections ... and don't not forget that there is people with indexes with terabytes of data that can't do a full re-index > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171096#comment-15171096 ] Jack Krupansky commented on SOLR-8110: -- bq. "safe"... "moderate"... "legacy" My only real nit is that it would be a shame if we couldn't say simply that people will be safe if they stick to Java identifier rules. That would mean $ and full Unicode. My point is that it makes learning Solr more intuitive since Java is more of a commonly-known entity - "Solr field names are Java identifiers", rather than encumber people with yet another set of rules to learn. Note that the current Solr code mostly uses isJavaIdentifierStart/isJavaIdentifierPart today, but disallowing $, probably due to parameter substitution. IOW, Unicode is there today. See: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/StrParser.java https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171092#comment-15171092 ] Jack Krupansky commented on SOLR-8110: -- bq. lucene expressions I was going to say that Luceene Expressions are basically JavaScript, but... they are sort-of based on JS, but really more of a conceptual rather than literal basis. Here's Lucene's grammar rule for VARIABLE: {code} VARIABLE: ID ARRAY* ( [.] ID ARRAY* )*; fragment ARRAY: [[] ( STRING | INTEGER ) [\]]; fragment ID: [_$a-zA-Z] [_$a-zA-Z0-9]*; fragment STRING : ['] ( '\\\'' | '' | ~[\\'] )*? ['] | ["] ( '\\"' | '' | ~[\\"] )*? ["] ; {code} See: https://github.com/apache/lucene-solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressions/js/Javascript.g4 No Unicode support, no random special characters, just $ and _, but apparently dot as well. An ID is: {code} ID: [_$a-zA-Z] [_$a-zA-Z0-9]* {code} And any number of IDs can be written with dots between them to represent a single VARIABLE token. JavaScript identifiers are defined in the ECMAScript spec: https://tc39.github.io/ecma262/#prod-IdentifierName Letters in Java/ECMAScript are Unicode as defined by the Unicode property “ID_Start” and "ID_Continue". Java/ECMAScript supports $ and _ in addition to letters. Identifier start and continue character types are defined by the Unicode UAX#31 Identifier spec: http://unicode.org/reports/tr31/ > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170746#comment-15170746 ] Jan Høydahl commented on SOLR-8110: --- I buy Yonik's arguments for reserving both dot and dash for future cool stuff. When I argued for allowing these earlier it was as a compromise between chaos (today) and pain (every Solr application needing a rewrite). Perhaps we could define three modes for users to choose between, with pros/cons documented: "safe" - {{\[a-zA-Z0-9_\]}} only, guaranteed future proof, full SQL, script support "moderate" - safe + dash, dot, dollar and perhaps national unicode letters "legacy" - no restrictions - only for easy back compat - will go away in 7.0 In 6.0, "moderate" would be the default, use of non-safe chars will be logged, and users can revert to "legacy" if they wish, but will be aware of what functionality they then sacrifice. In 7.0, "safe" could be made the default, and allow people to revert to "moderate" but take away "legacy". This will allow certain features to "bail out" with exception if in a mode that is known to cause problems. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170713#comment-15170713 ] Gus Heck commented on SOLR-8110: So perhaps a quoting Identifiers JIRA ticket that this is blocked by? > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170700#comment-15170700 ] Jack Krupansky commented on SOLR-8110: -- I can't recall any explicit statement on case sensitivity, although I would imagine that the existing "anything goes" model would default to case-sensitive. Personally, I would prefer case-insensitive. I can't recall a schema in which case-sensitive field names were used, while case mistakes are not uncommon. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170698#comment-15170698 ] Jack Krupansky commented on SOLR-8110: -- Dollar sign is permitted in Java identifier, including at the start. As per the Java Spec, "The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024)." It goes on to say that "The $ character should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems." See: https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8 If anything, I had been assuming that we were proposing a superset of Java identifiers (hyphen, dot as part of name.) I'm not positive whether there might be any conflict with parameter substitution for dollar sign. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170689#comment-15170689 ] Shawn Heisey commented on SOLR-8110: bq. I doubt there's a ton of people using dollar-signs in their field names Somebody came into the IRC channel once, using a Solr plugin for some PHP software which I can no longer remember, and when they shared the schema that came with the plugin, almost every field had at least two dollar signs in the name. I've seen all kinds of weird characters in field names. Sometimes they work, sometimes they don't. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170637#comment-15170637 ] Jason Gerlowski commented on SOLR-8110: --- bq. If we expand the set to include what people may have used in the past (dashes, dots, etc), that sort of takes us full circle back to the state of things today That's not quite true. Even if we allow all the characters that've been brought up by this JIRA (dashes, dots, underscores), this JIRA would still treat many others as invalid: slashes, spaces, unicode characters, most of the number-row of your keyboard (!@#$%^&()+), etc. I doubt there's a ton of people using dollar-signs in their field names, but anyone who does will probably see some quirky behavior. And if we can warn up front about that, even for rarely used characters, that might still be valuable. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170264#comment-15170264 ] Shawn Heisey commented on SOLR-8110: Right now, the first attempts at validation (SOLR-8642) have resulted in SOLR-8725. I believe that the restrictions in SOLR-8642 are the only ones that will be safe for advanced and possible future features (like infix expressions and separators that have meaning to Solr) that Yonik mentioned. I've stared at my keyboard for several minutes trying to see whether there are any other good choices for a meaningful separator character other than the period, and all of the possibilities are either used for something else, or they're really arcane and likely to cause the kind of problems we're aiming to prevent with this issue. For that reason, I think that periods should be banned for right now, and then only allowed in a limited fashion as necessary to support that future separator idea, if it ever becomes a reality. For my purposes, I'm perfectly OK with breaking backward compatibility in 6.0 and enforcing SOLR-8642 across the board. I will need to change my own core names (which currently use hyphens), but I'm OK with that. Recognizing the pain this could cause, I can get behind an approach where violations cause a warning in 6.0, and default to enforcement later. Parsing code tends to be extremely complex and fragile in the best conditions. When a character that has special meaning in some contexts is allowed in identifiers, that code is even more fragile. I would rather be more restrictive on this issue than risk parser bugs. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170182#comment-15170182 ] Yonik Seeley commented on SOLR-8110: bq. I've accepted the fact that Solr will probably never need to support full infix expressions. We may already have that via lucene expressions? or the new SQL support? Any future integration with scripting languages may also hit the issue. bq. Note that I am still a proponent of having quoted/escaped names which allow anything in names, ala SQL. I think I agree, but that doesn't seem compatible with this JIRA (which is about enforcing a specific restricted set). If we expand the set to include what people may have used in the past (dashes, dots, etc), that sort of takes us full circle back to the state of things today? > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170155#comment-15170155 ] Jack Krupansky commented on SOLR-8110: -- I've accepted the fact that Solr will probably never need to support full infix expressions. If somebody wants to seriously propose full infix expressions, fine, but it seems too much to me to worry much about vague possibilities. Note that I am still a proponent of having quoted/escaped names which allow anything in names, ala SQL. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170111#comment-15170111 ] Yonik Seeley commented on SOLR-8110: Dash is also problematic for unequivocal full support. For infix arithmetic expressions, it's natural to expect "a - 1" to be equivalent to "a-1". > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169830#comment-15169830 ] Jack Krupansky commented on SOLR-8110: -- Dot is a tough case. I can see reserving it for future expansion, but I can also see its utility in field names where its value is based on using it as a pseudo-field delimiter, such as in cases where data may in fact have come from an SQL ETL operation that actually did use the dot as a compound field name. How about... saying that dot is pseudo-reserved for compound field name references, and if the decomposed field name has a well-defined meaning in some context, such as where there are contextual named structural entities, such as table or collection names, then so be it, but if it has no clear meaning in a context, then the full, dotted name will be treated as a raw field name? So, at the level of the fl parameter a dotted name would get parsed as a compound name and then treated as a simple field name. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169435#comment-15169435 ] Yonik Seeley commented on SOLR-8110: Allowing '.' in fieldnames also acts to preclude the use of '.' as a field name separator in future features (like better support for nested documents, joins, SQL, etc). The same reasons why '.' may be useful to an application are the same reasons why '.' may be useful to Solr. So if we end up allowing '.', I guess I'd retain the caveat that one "may not have first class support from all components". > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch, SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166397#comment-15166397 ] Jason Gerlowski commented on SOLR-8110: --- Noticed something a little odd (to me) while doing some manual tests on the patch I pushed up last night With the added checks in the patch, core creation will fail if the core-create request references a configset containing an invalid field declaration. Solr returns BAD_REQUEST (400), and a message containing: {{Unable to create core [gettingstarted_shard1_replica2] Caused by: Dynamic field name '*_i$#' is invalid. Dynamic field names can only containalphanumerics, underscores, periods, and prefix/suffix asterisks.}} All looks good. When I test this out by creating a collection though, the result is more mixed. Solr returns an OK (200) status code. The message body though, contains a similar error message to the one above: {code} [~/c/s/l/solr] $ curl -i -l -k -X GET "http://localhost:8983/solr/admin/collections?action=CREATE=asdf=gettingstarted=1; HTTP/1.1 200 OK ... 02841org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://127.0.1.1:7574/solr: Error CREATEing SolrCore 'asdf_shard1_replica1': Unable to create core [asdf_shard1_replica1] Caused by: Dynamic field name '*_i$#' is invalid. Dynamic field names can only containalphanumerics, underscores, periods, and prefix/suffix asterisks. {code} Cluster-status and core-status requests confirm that the cores failed to initialize correctly, and were never fully created. My interpretation of this is that the collection-create commands fires off the core-create requests, but doesn't correctly interpret their success/failure. Is this expected/correct behavior, or is this a bug in the Collections API? Seems like a bug to me, but I'm not super familiar with the expected behavior of the Collections API, so just thought I'd mention it here before creating a JIRA out of it. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > Attachments: SOLR-8110.patch > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159900#comment-15159900 ] Jan Høydahl commented on SOLR-8110: --- bq. A dedicated config option might be better than luceneMatchVersion, but I'm OK either way. Both. We introduce a {{-Dsolr.enforceStrictIdentifiers}} and let luceneMatchVersion decide the default value if not explicitly set. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159441#comment-15159441 ] Shawn Heisey commented on SOLR-8110: The core characters we need are letters, numbers, and the underscore. Consensus seems to be that we allow dots (periods). If dashes (hyphens) are allowed, then they cannot be the first character, or that will cause confusion with negated query clauses and possibly cause other problems. Underscores must be allowed as the first character, so \_version\_ and other special fields used internally will work. My personal opinion is that the first character must be a letter or an underscore, so we don't have to worry about fixing bugs related to identifiers that start with a number. One unanswered question is whether to only allow ASCII, or if "letters and numbers" should include all matching characters in Unicode. My bias, which I admit is completely provincial and might be far too restrictive, is ASCII. A dedicated config option might be better than luceneMatchVersion, but I'm OK either way. There are users who must use an old version number even with newer versions of Solr. Changes to WordDelimiterFilter in 4.8 have a number of people using 4.7 in luceneMatchVersion. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159191#comment-15159191 ] Anshum Gupta commented on SOLR-8110: I personally would prefer the more invasive approach here. bq. Provide an enforcement option in solrconfig.xml to fail core startup Though that sounds reasonable, I strongly feel it'll come back to bite the users when they are unable to use certain features they want to use and the only way out at times would be re-indexing plus changing the client code. bq. I would prefer to not have that option once enforcement is default, but users will likely want it. If we give it to them, they would want it. I really think that enforcing restrictions on identifiers isn't a game changer for the end users. Specially considering the fact that we're supporting most use cases in the set of allow chars. In all here are the questions it boils down to: # What is the set of allowed chars, and the rules ? - I think we are fine with alphanumeric, dash, dot, and periods for this. # Enforcement details * Is this mandatory * Do we enable it by default. If not, how can users enable it. If yes, how do they disable it. I suggest we use luceneMatchVersion for this. It might restrict users from using more new features just to disable naming restrictions but that's a call we'd have to make. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159160#comment-15159160 ] Shawn Heisey commented on SOLR-8110: Assuming what I outlined above is the actual approach used: Although the default value for the enforcement option would be false, I think it should be enabled in all 6.x example configs. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159139#comment-15159139 ] Shawn Heisey commented on SOLR-8110: bq. Since the concept of enforcement of naming conventions is new, I would suggest making it optional in 6.x, preferably out-out I'm in favor of more aggressive measures, particularly if we can get it complete before the 6.0 release ... but I won't argue against a less invasive plan. Here's a more complete idea for a less invasive approach: * In 6.0 (or 6.1, etc): ** Default behavior: Check all identifiers on startup or when an API call is made that adds a new identifier. If something fails validation, log/return a warning, but don't fail. ** Provide an enforcement option in solrconfig.xml to fail core startup and API calls when the restrictions are violated. I'm not sure whether that should be a single option for everything, or separate options for field names, core/collection names, etc. * In 7.0, or perhaps a later 6.x release, turn enforcement on by default. One question can be decided at that later date: Do we keep the option to turn off enforcement? I would prefer to not have that option once enforcement is default, but users will likely want it. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159127#comment-15159127 ] Anshum Gupta commented on SOLR-8110: bq. But UNDERSCORE, DASH and DOT should be supported, i.e. [A-Za-z0-9_\.-] Sure, let's also change the collection/shard/alias/core name restrictions to this in that case for the sake of consistency. bq. Since the concept of enforcement of naming conventions is new, I would suggest making it optional in 6.x That's the thing, it has been optional forever. Optional = not enforced. I wanted this to be how Hoss suggested bq.make all of this logic conditional on either the schema "version" or the "luceneMatchVersion" and backport to the stable branch as well so it's optional prior to the next X.0 release but considering we may not have another 5x release, we should stop this with 6.0. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159113#comment-15159113 ] Jack Krupansky commented on SOLR-8110: -- 1. Since the concept of enforcement of naming conventions is new, I would suggest making it optional in 6.x, preferably out-out - most people can probably live with it without problem. Whether it would just be a schema version trigger or a separate config/schema option can be debated. 2. Consider the concept of delimited identifiers as in SQL - enclose non-regular names in quotes. It is worth noting that highly-irregular names are not currently supported in queries even today (most special characters will terminate the field name in most query parsers.) > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158606#comment-15158606 ] Jan Høydahl commented on SOLR-8110: --- Yep, we don't need to support all kind of strange unicode chars. But UNDERSCORE, DASH and DOT should be supported, i.e. {{\[A-Za-z0-9_\.-\]}}, allowing a very large percentage of existing Solr applications to continue working without modifications. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157475#comment-15157475 ] Anshum Gupta commented on SOLR-8110: I am sure there would be reasons for people to use field names that are outliers but we shouldn't really make our rule book for that. It would be good to * start with a rather restricted set, that is universal across Solr in terms of identifier restrictions * relax them if there are valid use cases that require us to support more in future releases. * make sure we maintain back-compat It's easier to have restrictions to begin with and people could write client code that maps field names rather than work around in Solr for handling outliers. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156259#comment-15156259 ] Jack Krupansky commented on SOLR-8110: -- There is the issue of simple ASCII letters vs. Unicode letters. Java Identifiers support arbitrary Unicode letters which "allows programmers to use identifiers in their programs that are written in their native languages." See Character.isJavaIdentifierStart and isJavaIdentifierPart. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156250#comment-15156250 ] Jason Gerlowski commented on SOLR-8110: --- 1.) Sounds like (aside from a few comments about the particular characters that the enforcement allows/denies) that no one's got a problem with the idea of enforcement on principle. So I'll start in on the changes Hoss suggested in his initial comment. 2.) As for the characters allowed/denied by enforcement. I lean towards re-using the character set already enforced for collection/shard/core creation: [a-zA-Z0-9_.]. I really think keeping consistency among the identifier rules is important, as people only need to learn one set of rules. We _could_ lean on a set of identifier rules that people already know (i.e. Java identifiers), but I'd argue that (1) the special characters allowed by the current recommendations (_, .) are more generally useful than those allowed by Java identifiers (_, $), and (2) Java identifiers have a few edge cases that'd be a pain to deal with (reserved keywords, can't start with $, can't start with number, etc.). Admittedly, these are rare cases, and I doubt they would be hit frequently, but it does seem like unnecessary complications to me. But that's just my opinion. Is there some value I'm missing to using the same identifier pattern as Java? Very possible I'm just overlooking something. In the meantime, I'll aim to use the current "recommendations", but I'll structure the patch in such a way that it'll be easier to change the allowed-char-set with a line or two. That way this discussion doesn't have to block my work on this. Thanks everyone for the input so far. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151121#comment-15151121 ] Jack Krupansky commented on SOLR-8110: -- It would be nice to say that a "Solr identifier" had the same rules as a Java identifier, but Java allows dollar signs and excludes keywords and reserved terms like if, for, true, false, null. Hmmm... I don't know if many people would complain is Solr didn't allow those keywords as field names. The main three exceptions to the current soft-rule that I have run across are: 1. Dot for compound names. 2. Hyphen feels a little more natural than underscore unless you're truly thinking about Java code and imagining that you could write a minus sign for a subtraction operation. 3. An ISO date/time value for dynamic fields which want to be time stamped. An optional text keyword prefix and hyphen are common for these timestamped columns as well. 4. Spaces, but I think sensible people can accept those as not permitted in names. The main difficulty I am aware of in Solr is parsing of function queries, including (or especially) in the field list of the fl parameter. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
IMO, simply having a standard at least gives us a standard and a reason to fix the edge cases. As it stands now, there's no basis to even say something should be fixed. So periods are fine as far as I'm concerned. On Feb 17, 2016 10:49, "Shawn Heisey (JIRA)"wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149605#comment-15149605 > ] > > Shawn Heisey commented on SOLR-8110: > > > With the caveat that I haven't actually tried it and haven't looked at > code, I can't immediately think of any reason periods would cause any > problems, at least not with the top three query parsers -- lucene, dismax, > and edismax. > > > Start enforcing field naming recomendations in next X.0 release? > > > > > > Key: SOLR-8110 > > URL: https://issues.apache.org/jira/browse/SOLR-8110 > > Project: Solr > > Issue Type: Improvement > >Reporter: Hoss Man > > > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > > bq. field names should consist of alphanumeric or underscore characters > only and not start with a digit. This is not currently strictly enforced, > but other field names will not have first class support from all components > and back compatibility is not guaranteed. ... > > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in > our next/future X.0 (ie: major) release. > > The goals of doing so being: > > * simplify some existing code/apis that currently use hueristics to deal > with lists of field and produce strange errors when the huerstic fails > (example: ReturnFields.add) > > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query > client expectations, etc... > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149605#comment-15149605 ] Shawn Heisey commented on SOLR-8110: With the caveat that I haven't actually tried it and haven't looked at code, I can't immediately think of any reason periods would cause any problems, at least not with the top three query parsers -- lucene, dismax, and edismax. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148954#comment-15148954 ] Jason Gerlowski commented on SOLR-8110: --- Personally, I'd be fine with allowing field names to contain periods. There's been some recent work to change the COLLECTION APIs to enforce the identifier "recommendations" (see SOLR-8642, SOLR-8308, SOLR-8677). This recent work allows periods in identifiers, because of the existing use-case of the ".system" collection. Allowing periods here would keep all identifier recommendation/enforcement consistent, which would be nice. Whether any off-the-beaten-path features break when field names contain periods though, I couldn't really speak to. But something to look into at least. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148921#comment-15148921 ] Gus Heck commented on SOLR-8110: Disallowing the '.' character in field names will be difficult for one project I work on. They are indexing fields of subordinate objects from an outside system using '.' to separate and distinguish the fields that have been denormalized onto the parent. This decision dates back over 3 years since before I started working with them, and much has been built on this. The field names from the system being indexed already contain '_' Yes one could use double '_''_' except that double underscores actually occurs in the source data too, so it would be on to triple '___' ... which gets hard to tell apart from '__' and of course multi-character separators are more work for parsing. I'd like to suggest that at least one more non-alphanum character be allowed, (of course '.' is my preference). Having only one non-alphanumeric character available would be painful. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148692#comment-15148692 ] Jason Gerlowski commented on SOLR-8110: --- Anyone object to my starting on this? Thinking of putting together a patch, as I'd like to see this happen. Just want to make sure there's not any huge objections before I start in. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132455#comment-15132455 ] Shawn Heisey commented on SOLR-8110: I think we should enforce restrictions on all identifiers, not just field names. Problems can also happen with odd collection names, core names, and possibly even other identifiers. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939513#comment-14939513 ] Alan Woodward commented on SOLR-8110: - +1 We should log a warning for each dodgy fieldname in 'lenient' mode, so that users are aware that they're running the risk of some functionality not working. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?
[ https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938973#comment-14938973 ] Hoss Man commented on SOLR-8110: As far as the "how" to implement this, i think it would be pretty straight forward... * update the schema parsing and schema API code to enforce the naming convention on "new" SchemaField instances ** use the same rules on prefix based dynamicFields * update DocumentBuilder to enforce these rules when iterating over the SolrInputFields to build up the underlying "Document" object passed to the IndexWriter ** this will help catch any problematic fields that might satisfy a suffice based dynamicField * make all of this logic conditional on either the schema "version" or the "luceneMatchVersion" and backport to the stable branch as well so it's optional prior to the next X.0 release ** i'd lean towards making it depend on luceneMatchVersion is the better course of action since that concept is designed around the idea that support for "older" values is automatically dropped in future versions, where as the "schema version" attribute has (so far) only ever been used to change the default behavior of various schema features - enforce anything. > Start enforcing field naming recomendations in next X.0 release? > > > Key: SOLR-8110 > URL: https://issues.apache.org/jira/browse/SOLR-8110 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man > > For a very long time now, Solr has made the following "recommendation" > regarding field naming conventions... > bq. field names should consist of alphanumeric or underscore characters only > and not start with a digit. This is not currently strictly enforced, but > other field names will not have first class support from all components and > back compatibility is not guaranteed. ... > I'm opening this issue to track discussion about if/how we should start > enforcing this as a rule instead (instead of just a "recommendation") in our > next/future X.0 (ie: major) release. > The goals of doing so being: > * simplify some existing code/apis that currently use hueristics to deal with > lists of field and produce strange errors when the huerstic fails (example: > ReturnFields.add) > * reduce confusion/pain for new users who might start out unaware of the > recommended conventions and then only later encountering a situation where > their field names are not supported by some feature and get frustrated > because they have to change their schema, reindex, update index/query client > expectations, etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org