[jira] [Commented] (SOLR-3250) Dynamic Field capabilities based on value not name

2013-06-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673197#comment-13673197
 ] 

Steve Rowe commented on SOLR-3250:
--

Now that we have the ability to dynamically add schema fields (SOLR-3251), I 
want to push forward on this issue.

Value-based dynamic field capabilities for document updates - which I'll 
sometimes refer to as schemaless mode - will a) determine the type for field 
names that don’t match explicit or dynamic fields in the schema; b) add these 
field names to the schema with their determined types; and c) complete the 
document update request as normal.  This process should apply equally to new 
doc additions, atomic updates, and regular updates.

In a conversation with [~hossman_luc...@fucit.org] about this feature, he 
suggested that configuration for parsing/converting {{String}}-typed field 
values into the appropriate Java objects could be separated from configuration 
of mappings from Java object types to schema field types.  In this way, 
components built for schemaless mode could be reused for other purposes.

JSON and Javabin content streams already carry some type information for their 
field values.  The {{ContentStreamLoader}}-s corresponding to these, 
{{JsonLoader}} and {{JavabinLoader}}, should set field value object types in 
the {{SolrInputDocument}} according to the content stream's data types.  
(Currently {{JavabinLoader}} does this correctly, but {{JsonLoader}} stores 
everything as {{String}}-s; this will need to be fixed.)  As a result, for the 
Java object types supported by these content streams and their loaders (as well 
as other update processors, etc. that set field values' Java object types), 
{{String}} parsing/conversion won't be required, and only the Java object type 
- schema field type mappings will be necessary to determine the schema field 
type for new fields.

[On 
SOLR-2802|https://issues.apache.org/jira/browse/SOLR-2802?focusedCommentId=13117911page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13117911],
 Hoss wrote that {{FieldMutatingUpdateProcessor}}-s that parsed dates, numbers 
and booleans would be generally useful.  I plan on going that route to 
implement {{String}}-typed field value parsing.  These field value parsing 
update processors should operate on {{String}}-valued fields that either a) are 
not in the schema, or b) have a schema field type with an appropriate 
{{typeClass}}.

After the new parsing update processors detect and convert field values to the 
appropriate Java object types, an update processor that adds fields to the 
schema as needed can be configured with a mapping from Java object type to 
schema field type.

Here is the list of things I think need to happen - I plan on making JIRA 
issues for each of these:

# Fix {{JsonLoader}} to create field values using the JSON-supplied type, 
rather than making everything a {{String}}.
# Add a new field update processor selector that will configure the processor 
to select fields that match any schema field, or that match no schema field, 
depending on its boolean parameter: {{bool 
name=fieldNameMatchesSchemaField}}
# Add new {{FieldMutatingUpdateProcessorFactory}} subclasses 
{{ParseFooUpdateProcessorFactory}}, where {{Foo}} includes {{Date}}, 
{{Double}}, {{Long}}, and {{Boolean}}. If they see a field value that is not 
{{String}}-valued, or can't parse the value, they will ignore it and leave it 
as is.  For multi-valued fields, they should be all-or-nothing.
# Add a new {{AddSchemaFieldsUpdateProcessorFactory}}, with configurable 
mappings from Java object type to schema field type, that will dynamically add 
fields to the schema, as needed.
# Add a new example config set for schemaless mode.


 Dynamic Field capabilities based on value not name
 --

 Key: SOLR-3250
 URL: https://issues.apache.org/jira/browse/SOLR-3250
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll

 In some situations, one already knows the schema of their content, so having 
 to declare a schema in Solr becomes cumbersome in some situations.  For 
 instance, if you have all your content in JSON (or can easily generate it) or 
 other typed serializations, then you already have a schema defined.  It would 
 be nice if we could have support for dynamic fields that used whatever name 
 was passed in, but then picked the appropriate FieldType for that field based 
 on the value of the content.  So, for instance, if the input is a number, it 
 would select the appropriate numeric type.  If it is a plain text string, it 
 would pick the appropriate text field (you could even add in language 
 detection here).  If it is comma separated, it would treat them as keywords, 
 etc.  Also, we could likely send in a hint as 

[jira] [Commented] (SOLR-3250) Dynamic Field capabilities based on value not name

2012-03-15 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230231#comment-13230231
 ] 

Grant Ingersoll commented on SOLR-3250:
---

Note, a core reload is not something I would want to do.

 Dynamic Field capabilities based on value not name
 --

 Key: SOLR-3250
 URL: https://issues.apache.org/jira/browse/SOLR-3250
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll

 In some situations, one already knows the schema of their content, so having 
 to declare a schema in Solr becomes cumbersome in some situations.  For 
 instance, if you have all your content in JSON (or can easily generate it) or 
 other typed serializations, then you already have a schema defined.  It would 
 be nice if we could have support for dynamic fields that used whatever name 
 was passed in, but then picked the appropriate FieldType for that field based 
 on the value of the content.  So, for instance, if the input is a number, it 
 would select the appropriate numeric type.  If it is a plain text string, it 
 would pick the appropriate text field (you could even add in language 
 detection here).  If it is comma separated, it would treat them as keywords, 
 etc.  Also, we could likely send in a hint as to the type too.
 With this approach, you of course have a first in wins situation, but 
 assuming you have this schema defined elsewhere, it is likely fine.
 Supporting such cases would allow us to be schemaless when appropriate, while 
 offering the benefits of schemas when appropriate.  Naturally, one could mix 
 and match these too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3250) Dynamic Field capabilities based on value not name

2012-03-15 Thread Yonik Seeley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230243#comment-13230243
 ] 

Yonik Seeley commented on SOLR-3250:


Of course hopefully everyone knows schemaless is mostly marketing b.s. - when 
people do this, there is still a schema, but it's guessed on first use (and 
hence generally a horrible idea for production systems).

It would be easy enough on a single node... but how does one handle a cluster?
Say you index price=0 on nodeA, and price=100.0 on nodeB?

A quick thought on how it might work:
 - have a separate file auto_fields.json that keeps track of the mappings that 
would be the same for all cores using that schema
 - when we run across a field we haven't seen before, we must guess a type for 
it, then grab a lock - update the auto_fields.json
 - we can update our in-memory schema with any new fields we find in 
auto_fields.json
 - works the same in ZK mode... it's just the auto_fields.json is in ZK, and we 
would use something like optimistic locking to update it



 Dynamic Field capabilities based on value not name
 --

 Key: SOLR-3250
 URL: https://issues.apache.org/jira/browse/SOLR-3250
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll

 In some situations, one already knows the schema of their content, so having 
 to declare a schema in Solr becomes cumbersome in some situations.  For 
 instance, if you have all your content in JSON (or can easily generate it) or 
 other typed serializations, then you already have a schema defined.  It would 
 be nice if we could have support for dynamic fields that used whatever name 
 was passed in, but then picked the appropriate FieldType for that field based 
 on the value of the content.  So, for instance, if the input is a number, it 
 would select the appropriate numeric type.  If it is a plain text string, it 
 would pick the appropriate text field (you could even add in language 
 detection here).  If it is comma separated, it would treat them as keywords, 
 etc.  Also, we could likely send in a hint as to the type too.
 With this approach, you of course have a first in wins situation, but 
 assuming you have this schema defined elsewhere, it is likely fine.
 Supporting such cases would allow us to be schemaless when appropriate, while 
 offering the benefits of schemas when appropriate.  Naturally, one could mix 
 and match these too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org