[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-11 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904596#comment-16904596
 ] 

Tomoko Uchida commented on SOLR-13593:
--

Here is the final patch: [^SOLR-13593.patch]

I opened follow-up tasks: [SOLR-13690] (migrate default schemas) and 
[SOLR-13691] (add the examples in Ref Guide).

This breaks two ICU factories so I cannot backport to the 8x branch as is :-/
We might be able to backport this with some fixes to keep backwards 
compatibility, but it could introduce another concerns/confusions. I think it 
would be better that we leave 8x branch unchanged - opinions or ideas?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
> Attachments: SOLR-13593.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904591#comment-16904591
 ] 

ASF subversion and git services commented on SOLR-13593:


Commit 9b986d268f3618d2137bbc8bd068a3db0d772049 in lucene-solr's branch 
refs/heads/master from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9b986d2 ]

SOLR-13593: Allow to look up analyzer components by their SPI names in field 
type configuration.


> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-11 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904584#comment-16904584
 ] 

Tomoko Uchida commented on SOLR-13593:
--

ICU factory "name" argument was changed to "form" on the master branch, so the 
factories can be looked up by names (with "form" attributes to specify 
normalization form) like this:
{code:xml}

  


  



  


  

{code}
Corresponding field types using "class" are:
{code:xml}

  


  



  


  

{code}
This works for me and the branch passed entire test. I will merge the all 
changes to the master branch soon.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-10 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904475#comment-16904475
 ] 

Tomoko Uchida commented on SOLR-13593:
--

When I grepped entire lucene code there are only two (ICU) factories which have 
"name" attribute. I opened LUCENE-8948, a blocker for this issue.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904461#comment-16904461
 ] 

Uwe Schindler commented on SOLR-13593:
--

"form" looks fine. Thanks, makes more sense from my standpoint.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-10 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904460#comment-16904460
 ] 

Tomoko Uchida commented on SOLR-13593:
--

FYI, the "name" should come from ICU4J library's method signature: 
{code}
public static Normalizer2 getInstance(InputStream data, String name, 
Normalizer2.Mode mode)
{code}

Anyway I also would like to change the factory ("form" - named after "Unicode 
normalization form" - might be suitable). Will open an issue for this.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904452#comment-16904452
 ] 

Uwe Schindler commented on SOLR-13593:
--

I tend to break the ICU factory. The name attribute makes no sense to me there 
anyways: I was annoyed by it already in Elasticsearch. Name for "What" the hell?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-10 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904430#comment-16904430
 ] 

Tomoko Uchida commented on SOLR-13593:
--

When running entire test, I encountered a TokenFilterFactory which has "name" 
argument: 
[https://lucene.apache.org/core/8_2_0/analyzers-icu/org/apache/lucene/analysis/icu/ICUNormalizer2FilterFactory.html]

So the field type definition including this filter is like this:
{code:xml}
  

  
  

  
{code}
It's incompatible with the changes here of course...

There may be some options.

1. Allow to use "class" and "name" as is (only when the "name" is not a SPI 
name) and use "class" to look up the factory in that case.
 2. Forbid "name" argument in a factory and change existing "name" arguments to 
different ones.
 3. Rethink attribute name to look up factories, because "name" is already 
reserved.

I don't like option 1 - it seems too confusing and makes it's impossible to 
discard "class" attribute in future releases. Also I don't think we should take 
option 3 due to a few anomalistic classes.
 Option 2 would make sense to me, can we fix "name" args in existing factories 
(maybe another LUCENE issue is needed) before proceeding? We may also need to 
delay exposing this feature until Solr 9.0 because it breaks backwards 
compatibility.

[~thetaphi]: Do you have any ideas about that?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-07 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901888#comment-16901888
 ] 

Uwe Schindler commented on SOLR-13593:
--

Looks good to me! +1

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-06 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901689#comment-16901689
 ] 

Tomoko Uchida commented on SOLR-13593:
--

I updated the pull request. If both of "name" and "class" appear at the same 
time on an element, SolrException is thrown and error logs are emmited.

I've also tested this manually: (1) start a local solr core with manually 
modified managed-schema which has field types including "name" property, (2) 
add types including "name" via the rest API as well. Works for me and this does 
not affect to existing field types (having "class"). Also the core can be 
restarted without any problems after adding the types having "name", so the 
regenerated & saved managed-schema works fine.

And I created the service provider file for Solr's custom filters (it has not 
been there so far) so that they can be looked up by names.

// META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory
{code:java}
org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory
org.apache.solr.rest.schema.analysis.ManagedSynonymFilterFactory
org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory
{code}
Let me know if there are any other things that would block this issue - I'd 
like to wait until this weekend and merge the changes into the ASF repo, if 
there are no objections.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-05 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900067#comment-16900067
 ] 

Steve Rowe commented on SOLR-13593:
---

bq. Both name & class seems error prone; can't we simply disallow this (throw 
error) from the start?
 
+1

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-05 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900066#comment-16900066
 ] 

Tomoko Uchida commented on SOLR-13593:
--

Thanks [~dsmiley], I agree with you.

I will update the PR so the property check will be:
 * when neither "name" nor "class" is passed : an exception is thrown.
 * when both of "name" and "class" are passed: an exception is thrown.
 * when only "name" is passed: it's accepted.
 * when only "class" is specified: it's accepted for backwards compatibility. 
(maybe we should deprecate this and emit some warnings, after default schema is 
changed.)

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-05 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899969#comment-16899969
 ] 

David Smiley commented on SOLR-13593:
-

Both name & class seems error prone; can't we simply disallow this (throw 
error) from the start?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-04 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899602#comment-16899602
 ] 

Tomoko Uchida commented on SOLR-13593:
--

I have updated the pull request.

1. If both of "name" and "class" are specified, this redundancy does not cause 
any error but warnings are emitted when loading the schema. In this case "name" 
is given priority over "class". (In a future release "class" could be 
deprecated so this behaviour makes sense to me, any comments?)
 2. Added unit tests: for loading field types from schema.xml and creating 
those via REST API.

LUCENE-8778 was backported with proper backwards compatibility (LUCENE-8911), 
so I think we can expose this feature from 8.x minor releases. After the pull 
request gets reviewed I'd like to commit the changes to the master and 8x 
branch, then migrate default schema file(s) and the examples in Ref Guide.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-11 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883151#comment-16883151
 ] 

Tomoko Uchida commented on SOLR-13593:
--

Just as a reminder, I reverted LUCENE-8778 so I will not be able to apply the 
patch here to 8.x until LUCENE-8911 is fixed in a good way.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881892#comment-16881892
 ] 

Tomoko Uchida commented on SOLR-13593:
--

I opened an blocker issue: LUCENE-8907.

[~thetaphi]: Currently I am not sure about how to properly deal with the 
backwards compatibility. Can you please give some directions?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881872#comment-16881872
 ] 

Jan Høydahl commented on SOLR-13593:


+1 for a back-compact blocker issue for 8.2

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881868#comment-16881868
 ] 

Uwe Schindler commented on SOLR-13593:
--

Hi, I am not fully sure how to handle that in 8.x. I just know that custom 
analyzers and token filters is one of the most often implemented external 
stuff, so we should at least keep some backwards compatibility.

Sorry for being that late, I just noticed the backport and then slept a few 
nights about it. Maybe let's open a separate LUCENE issue about this.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881863#comment-16881863
 ] 

Tomoko Uchida commented on SOLR-13593:
--

Hi [~thetaphi],

thanks for the comment, I've already backported the changes in the Lucene SPI 
loader to branch_8x (LUCENE-8778). I added the "NAME" field to all Solr's 
custom factories but 3rd party jars will be affected. Should I revert it before 
feature freeze for 8.2?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881845#comment-16881845
 ] 

Uwe Schindler commented on SOLR-13593:
--

While thinking about the next 8.x release: Actually because of the 
Lucene-change we already have a backwards-incompatible change in Lucene: If you 
implemented your own factory classes, you have to add the "NAME" field, 
otherwise the factory and (I think) all other factories won't initialize! This 
means that it may happen that somebody has an old JAR file with 3rd party 
analyzers in classpath and this fails to load the whole analyzer factory, 
making Solr unusable.

Maybe in 8.x we need some "transparent" backwards compatibility, so 
"incomplete" factories won't hurt loading the SPI framework, or we just emulate 
the old name. Should I maybe open another issue about this. Curretly I am a bit 
afraid of releasing the new factory interface without a migration path.

For 9.0 all is fine.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881841#comment-16881841
 ] 

Uwe Schindler commented on SOLR-13593:
--

Changes look good. I think we can commit this separately and change the 
example/default schemas later.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-10 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881764#comment-16881764
 ] 

Tomoko Uchida commented on SOLR-13593:
--

Hi [~dsmiley],
 thanks for your suggestion. I will open separated issues for 1. updating the 
default managed-schema and 2. updating the Ref Guide (let me know if we should 
change other parts), after this issue is resolved.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881500#comment-16881500
 ] 

David Smiley commented on SOLR-13593:
-

I really like this change!  Love "name".  I think you can make these changes 
everywhere in the next release without waiting for 9.0.  I do strongly suggest 
separating the commits from the essence of the change (what's in your PR now) 
with the changing of this pattern all over the place.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-09 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881010#comment-16881010
 ] 

Tomoko Uchida commented on SOLR-13593:
--

Hi,

The changes seem trivial so I will be able to add necessary tests and commit it 
in a few weeks, but before proceeding I would like to make sure the migration 
roadmap for this change.

If the proposal gains consensus I think we also should start to consider 
encouraging users to use "name=..." instead of "class=..." (so that we could 
remove "class=..." in a future release to reduce the code complexity).

The first step to do so would be changing the default managed-schema file and 
Solr Ref Guide to use "name=...", but I'm not sure about when we should do so. 
Would it be better if we include the change in a 8.x release, or should we wait 
until 9.0? Are there recommended ways/approaches for such migration?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-04 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878792#comment-16878792
 ] 

Tomoko Uchida commented on SOLR-13593:
--

I updated the pull request.
{quote}I am not so happy about the "spi" name, I'd perfer "name". Whats's 
exactly the problem with using "name"? The Solr plugin stuff should not be 
affected by this.
{quote}
+1 Now the PR uses "name" to specify SPI names (just as my first proposal):
{code:xml}


  







  

{code}
{code:java}
# REST API
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
 "name":"myNewTxtField",
 "class":"solr.TextField",
 "positionIncrementGap":"100",
 "analyzer" : {
"charFilters":[{
   "name":"htmlStrip"
}],
"tokenizer":{
   "name":"whitespace" },
"filters":[{
   "name":"lowercase"
}]}}
}' http://localhost:8983/solr/techproducts/schema
{code}

--
{quote}Another suggestion, not sure if it's already implemented: When 
persisting a managed schema after modification, it should use the provider 
names only and no longer persist class names.
{quote}
I have not noticed that.
 It seems that Solr persists the factory's original properties as-is with its 
class name ("class"). So I changed the property handling logic in 
{{o.a.s.schema.FieldType}} to discard "class" property when the SPI name is 
passed, and instead preserve "name" in the original properties to keep 
consistency of managed-schema.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-04 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878426#comment-16878426
 ] 

Uwe Schindler commented on SOLR-13593:
--

Hi,
I am not so happy about the "spi" name, I'd perfer "name". Whats's exactly the 
problem with using "name"? The Solr plugin stuff should not be affected by this.

Another suggestion, not sure if it's already implemented: When persisting a 
managed schema after modification, it should use the provider names only and no 
longer persist class names.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-04 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878369#comment-16878369
 ] 

Tomoko Uchida commented on SOLR-13593:
--

I've opened a draft pull request: 
[https://github.com/apache/lucene-solr/pull/761]. (Not yet tested.) I'm new to 
Solr schema handling, please feel free to add comments if I missed something.

This accepts SPI names when loading bundled managed-schema and calling REST API.

managed-schema example:
{code:xml}











{code}
REST API example:
{code:java}
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
 "name":"myNewTxtField",
 "class":"solr.TextField",
 "positionIncrementGap":"100",
 "analyzer" : {
"charFilters":[{
   "spi":"htmlStrip"
}],
"tokenizer":{
   "spi":"whitespace" },
"filters":[{
   "spi":"lowercase"
}]}}
}' http://localhost:8983/solr/techproducts/schema
{code}

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-03 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878294#comment-16878294
 ] 

Tomoko Uchida commented on SOLR-13593:
--

I started playing with {{FieldTypePluginLoader}} and noticed that the "name" 
attribute is reserved as Solr's plugin name. Instead of messing up the code, I 
would like to introduce new attribute "spi" (there could be more appropriate 
one?).
{code:xml}

  



  

{code}
Other than that, it seems straightforward to load the factories via the SPI 
names - add a new method to {{AbstractPluginLoader}} to create the new plugin 
instance by spi name, and override it in 
{{FieldTypePluginLoader#readAnalyzer(Node)}}.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-01 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876372#comment-16876372
 ] 

Varun Thacker commented on SOLR-13593:
--

{quote}I agree with this idea! Meanwhile I think the field type resolution 
should be treated in another issue.
{quote}
Make sense!

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-01 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876247#comment-16876247
 ] 

Tomoko Uchida commented on SOLR-13593:
--

This issue does not intend to replace current "class=...", of course, but 
provide alternative way to describe the analyzer to users. (Possibly the 
replacement could be happen in the future version.)

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-01 Thread Tomoko Uchida (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876230#comment-16876230
 ] 

Tomoko Uchida commented on SOLR-13593:
--

[~varunthacker]: sorry, I made mistakes in my description. I fixed the 
attribute "class" to "name" in my example. I would prefer "name" for "SPI 
names" instead of "type". 

bq. Also should {{solr.TextField}} be just {{text}} ?

I agree with this idea! Meanwhile I think the field type resolution should be 
treated in another issue.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-01 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876223#comment-16876223
 ] 

Varun Thacker commented on SOLR-13593:
--

+1 for having the ability to say "whitespace" etc

Is the attribute called name or class? in your example the tokenizer has 
{{name=whitespace}} but the filter has {{class=keywordMarker . Maybe }}{{type}} 
sounds better?

Also should {{solr.TextField}} be just {{text}} ?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org