Re: Custom token filter in SolrCloud mode using Blob store
I get it: Schema components do not yet support the Blob Store. thanks On piatok, 3. februára 2017 10:28:27 CET Michal Hlavac wrote: > Hi, > > it is possible to use BlobStore & Config API with enabled.runtime.lib=true to > add custom token filters? > I tried, but it doesn't work. > > 1. Uploaded jar lucene-analyzers-morfologik-6.4.0.jar file to blob store > .system with name lucene-analyzers-morfologik-6.4.0 > > 2. Add runtime library {"add-runtimelib": { > "name":"lucene-analyzers-morfologik-6.4.0", "version":1 }} > > 3. Create custom field type: > curl -X POST -H 'Content-type:application/json' --data-binary '{ > "add-field-type" : { > "name":"txt_sk_lemma", > "class":"solr.TextField", > "positionIncrementGap":"100", > "analyzer" : { > "tokenizer":{ >"class":"solr.StandardTokenizerFactory" }, > "filters":[ >{ >"class":"solr.SynonymFilterFactory", >"synonyms":"synonyms.txt", >"ignoreCase":true, >"expand":false >}, >{ >"class":"solr.StopFilterFactory", >"ignoreCase":true, >"words":"lang/stopwords_sk.txt" >}, >{ >"class":"solr.LowerCaseFilterFactory" >}, >{ >"class":"solr.KeywordMarkerFilterFactory", >"protected":"protwords.txt" >}, >{ >"runtimeLib":true, > > "class":"org.apache.lucene.analysis.morfologik.MorfologikFilterFactory", >"dictionary":"morfologik/stemming/sk/sk.dict" >} > ]}} > }' http://localhost:8983/solr/default/schema > > I get error > > "errorMessages":["Plugin init failure for [schema.xml] fieldType\nPlugin init > failure for [schema.xml] analyzer/filter: Error loading class > 'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\nError > loading class > 'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\norg.apache.lucene.analysis.morfologik.MorfologikFilterFactory\n" > > > thanks, miso >
Custom token filter in SolrCloud mode using Blob store
Hi, it is possible to use BlobStore & Config API with enabled.runtime.lib=true to add custom token filters? I tried, but it doesn't work. 1. Uploaded jar lucene-analyzers-morfologik-6.4.0.jar file to blob store .system with name lucene-analyzers-morfologik-6.4.0 2. Add runtime library {"add-runtimelib": { "name":"lucene-analyzers-morfologik-6.4.0", "version":1 }} 3. Create custom field type: curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field-type" : { "name":"txt_sk_lemma", "class":"solr.TextField", "positionIncrementGap":"100", "analyzer" : { "tokenizer":{ "class":"solr.StandardTokenizerFactory" }, "filters":[ { "class":"solr.SynonymFilterFactory", "synonyms":"synonyms.txt", "ignoreCase":true, "expand":false }, { "class":"solr.StopFilterFactory", "ignoreCase":true, "words":"lang/stopwords_sk.txt" }, { "class":"solr.LowerCaseFilterFactory" }, { "class":"solr.KeywordMarkerFilterFactory", "protected":"protwords.txt" }, { "runtimeLib":true, "class":"org.apache.lucene.analysis.morfologik.MorfologikFilterFactory", "dictionary":"morfologik/stemming/sk/sk.dict" } ]}} }' http://localhost:8983/solr/default/schema I get error "errorMessages":["Plugin init failure for [schema.xml] fieldType\nPlugin init failure for [schema.xml] analyzer/filter: Error loading class 'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\nError loading class 'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\norg.apache.lucene.analysis.morfologik.MorfologikFilterFactory\n" thanks, miso
Re: search request audit logging
Hi, I've noticed that in SOLR-7484 Solr part of http request was moved to SolrHttpCall. So there is no way to handle SolrQueryRequest and SolrQueryResponse in SolrDispatchFilter. Internal requet logging is SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) Is there is way to handle SOLR request/response to make custom log in SolrCloud environment. thank you, m. Hi, I would like to ask how to implement search audit logging. I've implemented some idea but I would like to ask if there is better approach to do this. Requirement is to log username, search time, all request parameters (q, fq, etc.), response data (count, etc) and important thing is to log all errors. As I need it only for search requests I implemented custom SearchHandler with something like: public class AuditSearchHandler extends SearchHandler { @Override public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) { try { super.handleRequest(req, rsp); } finally { doAuditLog(req, rsp); } } } Custom SearchComponent is not option, because it can't handle all errors. I read also /http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html/[1] and they mentioned custom Servlet Filter, but I didn't find example how to implement Servlet Filter to SOLR in proper way. If it's ok to edit web.xml thanks for suggestions, m. [1] http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html
search request audit logging
Hi, I would like to ask how to implement search audit logging. I've implemented some idea but I would like to ask if there is better approach to do this. Requirement is to log username, search time, all request parameters (q, fq, etc.), response data (count, etc) and important thing is to log all errors. As I need it only for search requests I implemented custom SearchHandler with something like: public class AuditSearchHandler extends SearchHandler { @Override public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) { try { super.handleRequest(req, rsp); } finally { doAuditLog(req, rsp); } } } Custom SearchComponent is not option, because it can't handle all errors. I read also /http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html[1]/ and they mentioned custom Servlet Filter, but I didn't find example how to implement Servlet Filter to SOLR in proper way. If it's ok to edit web.xml thanks for suggestions, m. [1] http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html
Re: Search Analytics Help
Hi, you have plenty options. Without any special effort there is ELK. Parse solr logs with logstash, feed elasticsearch with data, then analyze in kibana. Another option is to send every relevant search request to kafka, then you can do more sophisticated data analytic using kafka-stream API. Then use ELK to feed elasticsearch with logstash kafka input plugin. For this scenario you need to do some programming. I`ve already created this component but I hadn't time to publish it. Another option is use only logstash to feed e.g. graphite database and show results with grafana or combine all these options. You can also monitor SOLR instances by JMX logstash input plugin. Really don't understand what do you mean by saying that there is nothing satisfactory. m. On štvrtok, 26. apríla 2018 22:23:30 CEST Doug Turnbull wrote: > Honestly I haven’t seen anything satisfactory (yet). It’s a huge need in > the open source community > > On Thu, Apr 26, 2018 at 3:38 PM Ennio Bozzetti> wrote: > > > Hello, > > > > I'm setting up SOLR on an internal website for my company and I would like > > to know if anyone can recommend an analytics that I can see what the users > > are searching for? Does the log in SOLR give me that information? > > > > Thank you, > > Ennio Bozzetti > > > > -- > CTO, OpenSource Connections > Author, Relevant Search > http://o19s.com/doug
Re: Reading data from Oracle
Did you try to use ConcurrentUpdateSolrClient instead of HttpSolrClient? m. On štvrtok, 15. februára 2018 8:34:06 CET LOPEZ-CORTES Mariano-ext wrote: > Hello > > We have to delete our Solr collection and feed it periodically from an Oracle > database (up to 40M rows). > > We've done the following test: From a java program, we read chunks of data > from Oracle and inject to Solr (via Solrj). > > The problem : It is really really slow (1'5 nights). > > Is there one faster method to do that ? > > Thanks in advance.
deduplicated suggester
Hi, I wrote suggester based on AnalyzingInfixSuggester that deduplicates data on defined key pattern. Source code is on github: https://github.com/hlavki/solr-unique-suggester[1] m. [1] https://github.com/hlavki/solr-unique-suggester
Re: deduplication of suggester results are not enough
Hi Roland, I wrote AnalyzingInfixSuggester that deduplicates data on several levels at index time. I will publish it in few days on github. I'll wrote to this thread when done. m. On štvrtok 26. marca 2020 16:01:57 CET Szűcs Roland wrote: > Hi All, > > I follow the discussion of the suggester related discussions quite a while > ago. Everybody agrees that it is not the expected behaviour from a > Suggester where the terms are the entities and not the documents to return > the same string representation several times. > > One suggestion was to make deduplication on client side of Solr. It is very > easy in most of the client solution as any set based data structure solve > this. > > *But one important problem is not solved the deduplication: suggest.count*. > > If I have15 matches by the suggester and the suggest.count=10 and the first > 9 matches are the same, I will get back only 2 after the deduplication and > the remaining 5 unique terms will be never shown. > > What is the solution for this? > > Cheers, > Roland >
unified highlighter performance in solr 8.5.1
Hi, I have field: and configuration: true unified true content_txt_sk_highlight 2 true Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is really slow. Same query with hl.bs.type=WORD takes from 8 - 45 ms is this normal behaviour or should I create issue? thanks, m.
Re: unified highlighter performance in solr 8.5.1
I did same test on solr 8.4.1 and response times are same for both hl.bs.type=SENTENCE and hl.bs.type=WORD m. On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote: Hi, I have field: and configuration: true unified true content_txt_sk_highlight 2 true Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is really slow. Same query with hl.bs.type=WORD takes from 8 - 45 ms is this normal behaviour or should I create issue? thanks, m.
Re: unified highlighter performance in solr 8.5.1
Yes, have no problems in 8.4.1, only 8.5.1 Also yes, those are multi page pdf files. m. On pondelok 25. mája 2020 19:11:31 CEST David Smiley wrote: > Wow that's terrible! > So this problem is for SENTENCE in particular, and it's a regression in > 8.5? I'll see if I can reproduce this with the Lucene benchmark module. > > I figure you have some meaty text, like "page" size or longer? > > ~ David > > > On Mon, May 25, 2020 at 10:38 AM Michal Hlavac wrote: > > > I did same test on solr 8.4.1 and response times are same for both > > hl.bs.type=SENTENCE and hl.bs.type=WORD > > > > m. > > > > On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote: > > > > > > Hi, > > > > I have field: > > > stored="true" indexed="false" storeOffsetsWithPositions="true"/> > > > > and configuration: > > true > > unified > > true > > content_txt_sk_highlight > > 2 > > true > > > > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which > > is really slow. > > Same query with hl.bs.type=WORD takes from 8 - 45 ms > > > > is this normal behaviour or should I create issue? > > > > thanks, m. > > > > > > >
Re: unified highlighter performance in solr 8.5.1
fine, I'l try to write simple test, thanks On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote: > Please create an issue. I haven't reproduced it yet but it seems unlikely > to be user-error. > > ~ David > > > On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote: > > > Hi, > > > > I have field: > > > stored="true" indexed="false" storeOffsetsWithPositions="true"/> > > > > and configuration: > > true > > unified > > true > > content_txt_sk_highlight > > 2 > > true > > > > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which > > is really slow. > > Same query with hl.bs.type=WORD takes from 8 - 45 ms > > > > is this normal behaviour or should I create issue? > > > > thanks, m. > > >
Re: unified highlighter performance in solr 8.5.1
Hi David, sorry for my late answer. I created simple test scenarios on github https://github.com/hlavki/solr-unified-highlighter-test[1] There are 2 documents, both bigger sized. Test method: https://github.com/hlavki/solr-unified-highlighter-test/blob/master/src/test/java/com/example/HighlightTest.java#L60[2] Result is, that with hl.fragsizeIsMinimum=true=0 response times are similar to solr 8.4.1 I didn't expect that default configuration values should change response time that drastically. m. On streda 27. mája 2020 9:14:37 CEST David Smiley wrote: try setting hl.fragsizeIsMinimum=true I did some benchmarking and found that this helps quite a bit BTW I used the highlights.alg benchmark file, with some changes to make it more reflective of your scenario -- offsets in postings, and used "enwiki" (english wikipedia) docs which are larger than the Reuters ones (so it appears, any way). I had to do a bit of hacking to use the "LengthGoalBreakIterator, which wasn't previously used by this framework. ~ David On Tue, May 26, 2020 at 4:42 PM Michal Hlavac wrote: fine, I'l try to write simple test, thanks On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote: > Please create an issue. I haven't reproduced it yet but it seems unlikely > to be user-error. > > ~ David > > > On Mon, May 25, 2020 at 9:28 AM Michal Hlavac <_miso@hlavki.eu_> wrote: > > > Hi, > > > > I have field: > > > stored="true" indexed="false" storeOffsetsWithPositions="true"/> > > > > and configuration: > > true > > unified > > true > > content_txt_sk_highlight > > 2 > > true > > > > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which > > is really slow. > > Same query with hl.bs.type=WORD takes from 8 - 45 ms > > > > is this normal behaviour or should I create issue? > > > > thanks, m. > > > [1] https://github.com/hlavki/solr-unified-highlighter-test [2] https://github.com/hlavki/solr-unified-highlighter-test/blob/master/src/test/java/com/example/HighlightTest.java#L60 [3] mailto:m...@hlavki.eu