[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300831#comment-15300831
 ] 

Jan Høydahl commented on SOLR-9150:
---

bq. The type could directly be a suffix in the specified field name, with some 
separator, e.g.: "cost::int", where "cost" is the field name

I'm not a fan of overloading field name with special meaning to convey meta 
data. It's convenient since it will pass through existing code without change, 
but I'd like it to be more explicit.

As I see it, this issue is about improving Solr's field type guessing by giving 
Solr a hint about type along with the document, but still use plain field names 
such as "cost". A few ideas come to mind:
* Support this only on JSON input, infer type from the JSON type, and also 
select multiValued if array is used etc
* Provide types in a custom sidecar field {code}"_types_" : {"cost":"int", 
"desc":"string"}{code}
* Let type be encoded in value {code:xml}{!type=int}300{code}
* When Solr encounters a new field name, let it temporarily add it to schema as 
"string", tagging the docs with a special field 
{{"\_reguessfields\_":\["cost"\]}}. When enough docs have accumulated in the 
index with that new field, perform a new guess, and change type if necessary, 
reindexing all affected docs. This may require stored=true for all fields.

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300629#comment-15300629
 ] 

Steve Rowe commented on SOLR-9150:
--

bq. Am I right assuming that Solr does not simply rely on the field naming to 
know the type of a dynamic field? Or does it?

Solr *does* simply rely on the field name to know the field type for a dynamic 
field.  Dynamic field names are of the form {{\*suffix}} or {{prefix\*}} - i.e. 
a glob that matches field names that include a fixed suffix or prefix\[1].  
There is a single fieldtype associated with each dynamic field, so once a match 
is made, the fieldtype is also known.

bq. This does not necessarily has to be implemented in the core engine: I would 
be happy with any solution, that allowed me to create fields without having to 
query the current schema of a collection and then issue massive number of 
schema change requests.

Note that you can send a single request that contains any number of changes, 
though if any one fails, none are applied.

\[1] There can also be a {{\*}} dynamic field, which matches all field names. 

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-25 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300474#comment-15300474
 ] 

Shawn Heisey commented on SOLR-9150:


I would simply write a custom schema for each customer, mapping "foo" and "bar" 
to the proper types for that specific customer.  I would also expect each 
customer to want custom analysis chains for various text field types, which is 
further reason for a highly customized schema for each customer.

Dynamic fields are a useful feature, but heavy dynamic field usage only makes 
sense if there is a corresponding consistency in the field names of the source 
data that match up with the dynamic field definitions.

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-25 Thread Peter Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300292#comment-15300292
 ] 

Peter Horvath commented on SOLR-9150:
-

Here is our use-case: imagine a system, where Solr is used as a backing store 
of a hosted service, where a number of external customers regularly load their 
data (bringing their own field names like "foo" and "bar"), build some UIs with 
the tools you provide them for 3rd party users. In such environment you do not 
know the fields stored in Solr (except load time, when you can look at the 
values), and do not want to expose the implementation details of Solr being 
used for the backend. Since you want to hide the fact that fields "foo" and 
"bar" are actually stored internally as e.g "foo_i" and "bar_s", you will have 
to implement some mapping logic in the application, translating back and forth 
between the user view ("foo" and "bar" fields) and the actual backend names 
"foo_i" and "bar_s" -- this is something I would desperately like to avoid. 

I am not familiar with the internal workings of Solr, so I might be wrong, but 
I though achieving something would be relatively easy: in Lucene it is, where 
you can always add a new field when a document is inserted.

I think the operation should simply fail, in case an attempt is made to index a 
document field with a different data type. E.g. someone created "foo" by 
indexing "foo_i", then indexing a document with "foo_s" should simply be 
rejected by an exception. 

Am I right assuming that Solr does not simply rely on the field naming to know 
the type of a dynamic field?

This does not necessarily has to be implemented in the core engine: I would be 
happy with any solution, that allowed me to create fields without having to 
query the current schema of a collection and then issue massive number of 
schema change requests. (Adjusting the non-dynamic schema is plausible, but 
still difficult for us: a user might change his/her mind and load completely 
different data structure: we would have to purge dangling fields after 
that...). An optional hook, extension etc would be perfectly fine for us. Or if 
you have any better idea, how to deal with such requirements, I am much obliged 
to hear you inputs.


> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300024#comment-15300024
 ] 

Steve Rowe commented on SOLR-9150:
--

Rather than using the existing dynamic field capabilities, I think it would be 
better to come up with a new mechanism.

The type could directly be a suffix in the specified field name, with some 
separator, e.g.: "cost::int", where "cost" is the field name, "::" is the 
separator and "int" is the fieldtype.  That way dynamic fields aren't involved 
at all.  Then Hoss's UpdateProcessor flow could be used, but the field would be 
created using the "int" fieldtype rather than copying attributes from a 
dynamicfield. 

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299518#comment-15299518
 ] 

Hoss Man commented on SOLR-9150:


I don't particularly think this is a good idea (nor do i think general purpose 
field aliases -- as a high level configuration option -- would really solve any 
of hte underlying ambiguity problems) but if someone wanted to pursue this 
objective i would suggest implemetning it as an UpdateProcessor similar to how 
the the current AddSchemaFieldsUpdateProcessorFactory works using teh 
underlying ManagedSchema APIsto add fields -- but instead of saying "i see a 
'cost' field in this doc, but no 'cost' field in the schema, so i will add it 
using a configured/default type mapping" the logic could say "I see a 'cost_i' 
field in this doc, which matches a '\*_i' dynamic field, using a prefix of 
'cost'; since 'cost' does not already exist in the schema, i will copy the 
attributes from '\*_i' into a new 'cost' field and rename the 'cost_i' field in 
this document 'cost' before adding it"

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-24 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299357#comment-15299357
 ] 

Erick Erickson commented on SOLR-9150:
--

Shawn actually brings up a couple of points that, IMO, shoot down the idea. I 
don't see a reasonable way to support two _different_ types that have the 
suffix stripped later in the process. If 'foo_s' and 'foo_i' would both map to 
'foo'. From there it's all a mess.

All the code in Solr that tries to resolve field names would have to be visited 
unless, as Shawn mentions, managed schemas would have to be updated.

This would get messy really quickly and there still hasn't been a clear case 
made for why this would be worth the complexity.

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-23 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297226#comment-15297226
 ] 

Kevin Risden commented on SOLR-9150:


I haven't put too much thought into this, but how about an alternative idea 
that may solve the same problem. Field name aliases ie: foo would point to 
foo_i when querying. I have no idea what kind of impact this would have 
querying to resolve this. This would enable dynamic fields but have a defined 
alias mapping added later.

I can see the case where querying and needing to know the _type would be a 
pain, but at indexing time the _type would be known.

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297154#comment-15297154
 ] 

Shawn Heisey commented on SOLR-9150:


Let's imagine an index that does not have a field named "foo", but does have 
"*_i" and "*_s" dynamicField entries.

An indexing request comes in with number in a field named "foo_i".  With this 
feature, this would put that data into a Lucene field named "foo" ... but at 
that point, how is Solr supposed to know that a query on the "foo" field should 
be treated as a number?  The only way I can imagine this working without 
problems is if this action results in a managed_schema update that *adds* the 
field named "foo" to the schema with the same definition as "*_i".

As a further thought experiment, what exactly should happen if a subsequent 
indexing request contains a field named "foo_s" that holds a non-numeric 
string?  If the first request containing foo_i results in foo being added to a 
managed schema, then a subsequent request with foo_s would fail, because the 
incoming data would not be compatible with an integer field.


> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9150) Add configuration option to strip type postfix from dynamic field name on document indexing

2016-05-23 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296658#comment-15296658
 ] 

Erick Erickson commented on SOLR-9150:
--

I'm not  big fan of this idea at first blush, it seems like unnecessary 
complexity in the _engine_ to support... I'm not sure what.

"Not desirable" because of what? Convenience at the app layer? Some UI that has 
a pick list? It'd be useful to have a statement of what problem this capability 
is attempting to solve before jumping in and making changes, there might be 
other approaches already in place. There's already "field aliasing" to allow 
display of a different field name than it actually is for instance.

I'm not totally against the idea, I just don't see a clear problem statement 
here.

> Add configuration option to strip type postfix from dynamic field name on 
> document indexing
> ---
>
> Key: SOLR-9150
> URL: https://issues.apache.org/jira/browse/SOLR-9150
> Project: Solr
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 6.0
>Reporter: Peter Horvath
>
> In some cases, incorporating field type indication to the name of a dynamic 
> field is not desirable. 
> It would be great if there was a configuration option (global, instance level 
> or collection-level), which instructed Solr to create dynamic fields with the 
> type postfix stripped. 
> For example, suppose the schema contained a dynamic field with a name of 
> "*_i". If the user attempts to index a document with a "cost_i" field, but no 
> explicit "cost_i" field is defined in the schema, then a "cost" field 
> (without "_i" postfix) would be created with the field type and analysis 
> defined for "*_i". As a result queries could be executed against the dynamic 
> field being referred to without the type indicator postfix: "cost:10"
> To retain backward compatibility, this feature should have to be enabled 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org