Mike Thomsen created SOLR-9525:
--
Summary: split() function for streaming
Key: SOLR-9525
URL: https://issues.apache.org/jira/browse/SOLR-9525
Project: Solr
Issue Type: Wish
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Mike Thomsen
This is the original description I posted on solr-user:
Read this article and thought it could be interesting as a way to do ingestion:
https://dzone.com/articles/solr-streaming-expressions-for-collection-auto-upd-1
Example from the article:
daemon(id="12345",
runInterval="6",
update(users,
batchSize=10,
jdbc(connection="jdbc:mysql://localhost/users?user=root=solr",
sql="SELECT id, name FROM users", sort="id asc", driver="com.mysql.jdbc.Driver")
)
What's the best way to handle a multivalue field using this API? Is there a way
to tokenize something returned in a database field?
Joel Bernstein responded with this:
Unfortunately there currently isn't a way to split a field. But this would
be nice functionality to add.
The approach would be to an add a split operation that would be used by the
select() function. It would look like this:
select(jdbc(...), split(fieldA, delim=","), ...)
This would make a good jira issue.
So the TL;DR version is that I need the ability to specify in such a streaming
operation certain fields to tokenize into multivalue fields. In one schema I
may have to support, there are probably a half a dozen such fields.
Perhaps I am missing a feature here, but until this is done it looks like this
new capability cannot handle multivalue fields until something like this is in
place.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org