Re: Running an analyzer chain in an update request processor

2018-04-23 Thread Steve Rowe
Hi Walter,

I haven’t seen this before, but it looks like 
https://bugs.java.com/view_bug.do?bug_id=8071775

--
Steve
www.lucidworks.com

> On Apr 20, 2018, at 7:54 PM, Walter Underwood  wrote:
> 
> I’m back.
> 
> I think I’m following the steps in Eric Hatcher’s slides: 
> https://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks
> 
> With a few minor changes, like using getIndexAnalyzer() because getAnalyzer() 
> is gone. And I’ve pulled the subroutine code into the main processAdd 
> function.
> 
> Any ideas about the cause of this error?
> 
> java.lang.ClassCastException: Cannot cast 
> jdk.internal.dynalink.beans.StaticClass to java.lang.Class
>   at 
> java.lang.invoke.MethodHandleImpl.newClassCastException(MethodHandleImpl.java:361)
>   at 
> java.lang.invoke.MethodHandleImpl.castReference(MethodHandleImpl.java:356)
>   at 
> jdk.nashorn.internal.scripts.Script$Recompilation$37$104A$\^eval\_.processAdd(:15)
> 
> This is the code up through line 15:
> 
>// Generate minhashes using the "minhash" analyzer chain
>var analyzer = 
> req.getCore().getLatestSchema().getFieldTypeByName('minhash').getIndexAnalyzer();
>var hashes = [];
>var token_stream = analyzer.tokenStream(null, new 
> java.io.StringReader(question));
>var term_att = 
> token_stream.getAttribute(Packages.org.apache.lucene.analysis.tokenattributes.CharTermAttribute);
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 7, 2018, at 9:50 AM, Walter Underwood  wrote:
>> 
>> As I think more about this, we should have a signature processor that uses 
>> minhash. The MD5 signature processor was really easy to use.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org 
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Apr 7, 2018, at 4:55 AM, Emir Arnautović >> > wrote:
>>> 
>>> Hi Walter,
>>> I did this sample processor for the purpose of having doc values on 
>>> analysed field: https://github.com/od-bits/solr-multivaluefield-processor 
>>>  
>>> >> >
>>> 
>>> (+ related blog: 
>>> http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
>>>  
>>> >> >)
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ 
>>> 
>>> 
>>> 
>>> 
 On 6 Apr 2018, at 23:46, Walter Underwood > wrote:
 
 Is there an easy way to define an analyzer chain in schema.xml then run it 
 in an update request processor?
 
 I want to run a chain ending in the minhash token filter, then take those 
 minhashes, convert them to hex, and put them in a string field. I’d like 
 the values stored.
 
 It seems like this could all work in an update request processor. Grab the 
 text from one field, run it through the chain, format the output tokens 
 and add them to the field for hashes.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org 
 http://observer.wunderwood.org/  (my blog)
 
>>> 
>> 
> 



Re: Running an analyzer chain in an update request processor

2018-04-20 Thread Walter Underwood
I’m back.

I think I’m following the steps in Eric Hatcher’s slides: 
https://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks

With a few minor changes, like using getIndexAnalyzer() because getAnalyzer() 
is gone. And I’ve pulled the subroutine code into the main processAdd function.

Any ideas about the cause of this error?

java.lang.ClassCastException: Cannot cast 
jdk.internal.dynalink.beans.StaticClass to java.lang.Class
at 
java.lang.invoke.MethodHandleImpl.newClassCastException(MethodHandleImpl.java:361)
at 
java.lang.invoke.MethodHandleImpl.castReference(MethodHandleImpl.java:356)
at 
jdk.nashorn.internal.scripts.Script$Recompilation$37$104A$\^eval\_.processAdd(:15)

This is the code up through line 15:

// Generate minhashes using the "minhash" analyzer chain
var analyzer = 
req.getCore().getLatestSchema().getFieldTypeByName('minhash').getIndexAnalyzer();
var hashes = [];
var token_stream = analyzer.tokenStream(null, new 
java.io.StringReader(question));
var term_att = 
token_stream.getAttribute(Packages.org.apache.lucene.analysis.tokenattributes.CharTermAttribute);

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 7, 2018, at 9:50 AM, Walter Underwood  wrote:
> 
> As I think more about this, we should have a signature processor that uses 
> minhash. The MD5 signature processor was really easy to use.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 7, 2018, at 4:55 AM, Emir Arnautović > > wrote:
>> 
>> Hi Walter,
>> I did this sample processor for the purpose of having doc values on analysed 
>> field: https://github.com/od-bits/solr-multivaluefield-processor 
>>  
>> > >
>> 
>> (+ related blog: 
>> http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
>>  
>> > >)
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ 
>> 
>> 
>> 
>> 
>>> On 6 Apr 2018, at 23:46, Walter Underwood >> > wrote:
>>> 
>>> Is there an easy way to define an analyzer chain in schema.xml then run it 
>>> in an update request processor?
>>> 
>>> I want to run a chain ending in the minhash token filter, then take those 
>>> minhashes, convert them to hex, and put them in a string field. I’d like 
>>> the values stored.
>>> 
>>> It seems like this could all work in an update request processor. Grab the 
>>> text from one field, run it through the chain, format the output tokens and 
>>> add them to the field for hashes.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org 
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>> 
> 



Re: Running an analyzer chain in an update request processor

2018-04-07 Thread Walter Underwood
As I think more about this, we should have a signature processor that uses 
minhash. The MD5 signature processor was really easy to use.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 7, 2018, at 4:55 AM, Emir Arnautović  
> wrote:
> 
> Hi Walter,
> I did this sample processor for the purpose of having doc values on analysed 
> field: https://github.com/od-bits/solr-multivaluefield-processor 
> 
> 
> (+ related blog: 
> http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
> )
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 6 Apr 2018, at 23:46, Walter Underwood  wrote:
>> 
>> Is there an easy way to define an analyzer chain in schema.xml then run it 
>> in an update request processor?
>> 
>> I want to run a chain ending in the minhash token filter, then take those 
>> minhashes, convert them to hex, and put them in a string field. I’d like the 
>> values stored.
>> 
>> It seems like this could all work in an update request processor. Grab the 
>> text from one field, run it through the chain, format the output tokens and 
>> add them to the field for hashes.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
> 



Re: Running an analyzer chain in an update request processor

2018-04-07 Thread Emir Arnautović
Hi Walter,
I did this sample processor for the purpose of having doc values on analysed 
field: https://github.com/od-bits/solr-multivaluefield-processor 


(+ related blog: 
http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
)

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 6 Apr 2018, at 23:46, Walter Underwood  wrote:
> 
> Is there an easy way to define an analyzer chain in schema.xml then run it in 
> an update request processor?
> 
> I want to run a chain ending in the minhash token filter, then take those 
> minhashes, convert them to hex, and put them in a string field. I’d like the 
> values stored.
> 
> It seems like this could all work in an update request processor. Grab the 
> text from one field, run it through the chain, format the output tokens and 
> add them to the field for hashes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 



Re: Running an analyzer chain in an update request processor

2018-04-06 Thread Walter Underwood
Thanks, I should have mentioned that I’m doing this in a script URP.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 6, 2018, at 3:06 PM, Steve Rowe  wrote:
> 
> Hi Walter,
> 
> I’ve seen Erik Hatcher recommend using the StatelessScriptUpdateProcessor for 
> this purpose, e.g. on slides 10-11 of 
> https://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks .
> 
> More info at https://wiki.apache.org/solr/ScriptUpdateProcessor and 
> https://lucene.apache.org/solr/7_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
>  
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Apr 6, 2018, at 5:46 PM, Walter Underwood  wrote:
>> 
>> Is there an easy way to define an analyzer chain in schema.xml then run it 
>> in an update request processor?
>> 
>> I want to run a chain ending in the minhash token filter, then take those 
>> minhashes, convert them to hex, and put them in a string field. I’d like the 
>> values stored.
>> 
>> It seems like this could all work in an update request processor. Grab the 
>> text from one field, run it through the chain, format the output tokens and 
>> add them to the field for hashes.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
> 



Re: Running an analyzer chain in an update request processor

2018-04-06 Thread Steve Rowe
Hi Walter,

I’ve seen Erik Hatcher recommend using the StatelessScriptUpdateProcessor for 
this purpose, e.g. on slides 10-11 of 
https://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks .

More info at https://wiki.apache.org/solr/ScriptUpdateProcessor and 
https://lucene.apache.org/solr/7_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
 

--
Steve
www.lucidworks.com

> On Apr 6, 2018, at 5:46 PM, Walter Underwood  wrote:
> 
> Is there an easy way to define an analyzer chain in schema.xml then run it in 
> an update request processor?
> 
> I want to run a chain ending in the minhash token filter, then take those 
> minhashes, convert them to hex, and put them in a string field. I’d like the 
> values stored.
> 
> It seems like this could all work in an update request processor. Grab the 
> text from one field, run it through the chain, format the output tokens and 
> add them to the field for hashes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 



Running an analyzer chain in an update request processor

2018-04-06 Thread Walter Underwood
Is there an easy way to define an analyzer chain in schema.xml then run it in 
an update request processor?

I want to run a chain ending in the minhash token filter, then take those 
minhashes, convert them to hex, and put them in a string field. I’d like the 
values stored.

It seems like this could all work in an update request processor. Grab the text 
from one field, run it through the chain, format the output tokens and add them 
to the field for hashes.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)