[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300831#comment-15300831 ] Jan Høydahl commented on SOLR-9150: --- bq. The type could directly be a suffix in the specified field name, with some separator, e.g.: "cost::int", where "cost" is the field name I'm not a fan of overloading field name with special meaning to convey meta data. It's convenient since it will pass through existing code without change, but I'd like it to be more explicit. As I see it, this issue is about improving Solr's field type guessing by giving Solr a hint about type along with the document, but still use plain field names such as "cost". A few ideas come to mind: * Support this only on JSON input, infer type from the JSON type, and also select multiValued if array is used etc * Provide types in a custom sidecar field {code}"_types_" : {"cost":"int", "desc":"string"}{code} * Let type be encoded in value {code:xml}{!type=int}300{code} * When Solr encounters a new field name, let it temporarily add it to schema as "string", tagging the docs with a special field {{"\_reguessfields\_":\["cost"\]}}. When enough docs have accumulated in the index with that new field, perform a new guess, and change type if necessary, reindexing all affected docs. This may require stored=true for all fields. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300629#comment-15300629 ] Steve Rowe commented on SOLR-9150: -- bq. Am I right assuming that Solr does not simply rely on the field naming to know the type of a dynamic field? Or does it? Solr *does* simply rely on the field name to know the field type for a dynamic field. Dynamic field names are of the form {{\*suffix}} or {{prefix\*}} - i.e. a glob that matches field names that include a fixed suffix or prefix\[1]. There is a single fieldtype associated with each dynamic field, so once a match is made, the fieldtype is also known. bq. This does not necessarily has to be implemented in the core engine: I would be happy with any solution, that allowed me to create fields without having to query the current schema of a collection and then issue massive number of schema change requests. Note that you can send a single request that contains any number of changes, though if any one fails, none are applied. \[1] There can also be a {{\*}} dynamic field, which matches all field names. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300474#comment-15300474 ] Shawn Heisey commented on SOLR-9150: I would simply write a custom schema for each customer, mapping "foo" and "bar" to the proper types for that specific customer. I would also expect each customer to want custom analysis chains for various text field types, which is further reason for a highly customized schema for each customer. Dynamic fields are a useful feature, but heavy dynamic field usage only makes sense if there is a corresponding consistency in the field names of the source data that match up with the dynamic field definitions. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300292#comment-15300292 ] Peter Horvath commented on SOLR-9150: - Here is our use-case: imagine a system, where Solr is used as a backing store of a hosted service, where a number of external customers regularly load their data (bringing their own field names like "foo" and "bar"), build some UIs with the tools you provide them for 3rd party users. In such environment you do not know the fields stored in Solr (except load time, when you can look at the values), and do not want to expose the implementation details of Solr being used for the backend. Since you want to hide the fact that fields "foo" and "bar" are actually stored internally as e.g "foo_i" and "bar_s", you will have to implement some mapping logic in the application, translating back and forth between the user view ("foo" and "bar" fields) and the actual backend names "foo_i" and "bar_s" -- this is something I would desperately like to avoid. I am not familiar with the internal workings of Solr, so I might be wrong, but I though achieving something would be relatively easy: in Lucene it is, where you can always add a new field when a document is inserted. I think the operation should simply fail, in case an attempt is made to index a document field with a different data type. E.g. someone created "foo" by indexing "foo_i", then indexing a document with "foo_s" should simply be rejected by an exception. Am I right assuming that Solr does not simply rely on the field naming to know the type of a dynamic field? This does not necessarily has to be implemented in the core engine: I would be happy with any solution, that allowed me to create fields without having to query the current schema of a collection and then issue massive number of schema change requests. (Adjusting the non-dynamic schema is plausible, but still difficult for us: a user might change his/her mind and load completely different data structure: we would have to purge dangling fields after that...). An optional hook, extension etc would be perfectly fine for us. Or if you have any better idea, how to deal with such requirements, I am much obliged to hear you inputs. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300024#comment-15300024 ] Steve Rowe commented on SOLR-9150: -- Rather than using the existing dynamic field capabilities, I think it would be better to come up with a new mechanism. The type could directly be a suffix in the specified field name, with some separator, e.g.: "cost::int", where "cost" is the field name, "::" is the separator and "int" is the fieldtype. That way dynamic fields aren't involved at all. Then Hoss's UpdateProcessor flow could be used, but the field would be created using the "int" fieldtype rather than copying attributes from a dynamicfield. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299518#comment-15299518 ] Hoss Man commented on SOLR-9150: I don't particularly think this is a good idea (nor do i think general purpose field aliases -- as a high level configuration option -- would really solve any of hte underlying ambiguity problems) but if someone wanted to pursue this objective i would suggest implemetning it as an UpdateProcessor similar to how the the current AddSchemaFieldsUpdateProcessorFactory works using teh underlying ManagedSchema APIsto add fields -- but instead of saying "i see a 'cost' field in this doc, but no 'cost' field in the schema, so i will add it using a configured/default type mapping" the logic could say "I see a 'cost_i' field in this doc, which matches a '\*_i' dynamic field, using a prefix of 'cost'; since 'cost' does not already exist in the schema, i will copy the attributes from '\*_i' into a new 'cost' field and rename the 'cost_i' field in this document 'cost' before adding it" > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299357#comment-15299357 ] Erick Erickson commented on SOLR-9150: -- Shawn actually brings up a couple of points that, IMO, shoot down the idea. I don't see a reasonable way to support two _different_ types that have the suffix stripped later in the process. If 'foo_s' and 'foo_i' would both map to 'foo'. From there it's all a mess. All the code in Solr that tries to resolve field names would have to be visited unless, as Shawn mentions, managed schemas would have to be updated. This would get messy really quickly and there still hasn't been a clear case made for why this would be worth the complexity. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297226#comment-15297226 ] Kevin Risden commented on SOLR-9150: I haven't put too much thought into this, but how about an alternative idea that may solve the same problem. Field name aliases ie: foo would point to foo_i when querying. I have no idea what kind of impact this would have querying to resolve this. This would enable dynamic fields but have a defined alias mapping added later. I can see the case where querying and needing to know the _type would be a pain, but at indexing time the _type would be known. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297154#comment-15297154 ] Shawn Heisey commented on SOLR-9150: Let's imagine an index that does not have a field named "foo", but does have "*_i" and "*_s" dynamicField entries. An indexing request comes in with number in a field named "foo_i". With this feature, this would put that data into a Lucene field named "foo" ... but at that point, how is Solr supposed to know that a query on the "foo" field should be treated as a number? The only way I can imagine this working without problems is if this action results in a managed_schema update that *adds* the field named "foo" to the schema with the same definition as "*_i". As a further thought experiment, what exactly should happen if a subsequent indexing request contains a field named "foo_s" that holds a non-numeric string? If the first request containing foo_i results in foo being added to a managed schema, then a subsequent request with foo_s would fail, because the incoming data would not be compatible with an integer field. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing
[ https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296658#comment-15296658 ] Erick Erickson commented on SOLR-9150: -- I'm not big fan of this idea at first blush, it seems like unnecessary complexity in the _engine_ to support... I'm not sure what. "Not desirable" because of what? Convenience at the app layer? Some UI that has a pick list? It'd be useful to have a statement of what problem this capability is attempting to solve before jumping in and making changes, there might be other approaches already in place. There's already "field aliasing" to allow display of a different field name than it actually is for instance. I'm not totally against the idea, I just don't see a clear problem statement here. > Add configuration option to strip type postfix from dynamic field name on > document indexing > --- > > Key: SOLR-9150 > URL: https://issues.apache.org/jira/browse/SOLR-9150 > Project: Solr > Issue Type: New Feature > Components: Server >Affects Versions: 6.0 >Reporter: Peter Horvath > > In some cases, incorporating field type indication to the name of a dynamic > field is not desirable. > It would be great if there was a configuration option (global, instance level > or collection-level), which instructed Solr to create dynamic fields with the > type postfix stripped. > For example, suppose the schema contained a dynamic field with a name of > "*_i". If the user attempts to index a document with a "cost_i" field, but no > explicit "cost_i" field is defined in the schema, then a "cost" field > (without "_i" postfix) would be created with the field type and analysis > defined for "*_i". As a result queries could be executed against the dynamic > field being referred to without the type indicator postfix: "cost:10" > To retain backward compatibility, this feature should have to be enabled > explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org