Re: unified highlighter performance in solr 8.5.1

2020-06-08 Thread Michal Hlavac
Hi David,

sorry for my late answer. I created simple test scenarios on github 
https://github.com/hlavki/solr-unified-highlighter-test[1] 
There are 2 documents, both bigger sized.
Test method: 
https://github.com/hlavki/solr-unified-highlighter-test/blob/master/src/test/java/com/example/HighlightTest.java#L60[2]
 

Result is, that with hl.fragsizeIsMinimum=true=0 response 
times are similar to solr 8.4.1
I didn't expect that default configuration values should change response time 
that drastically.

m.

On streda 27. mája 2020 9:14:37 CEST David Smiley wrote:


try setting hl.fragsizeIsMinimum=true
I did some benchmarking and found that this helps quite a bit




BTW I used the highlights.alg benchmark file, with some changes to make it more 
reflective of your scenario -- offsets in postings, and used "enwiki" (english 
wikipedia) docs which are larger than the Reuters ones (so it appears, any 
way).  I had to do a bit of hacking to use the "LengthGoalBreakIterator, which 
wasn't previously used by this framework.


~ David



On Tue, May 26, 2020 at 4:42 PM Michal Hlavac  wrote:


fine, I'l try to write simple test, thanks
 
On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote:
> Please create an issue.  I haven't reproduced it yet but it seems unlikely
> to be user-error.
> 
> ~ David
> 
> 
> On Mon, May 25, 2020 at 9:28 AM Michal Hlavac <_miso@hlavki.eu_> wrote:
> 
> > Hi,
> >
> > I have field:
> >  > stored="true" indexed="false" storeOffsetsWithPositions="true"/>
> >
> > and configuration:
> > true
> > unified
> > true
> > content_txt_sk_highlight
> > 2
> > true
> >
> > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which
> > is really slow.
> > Same query with hl.bs.type=WORD takes from 8 - 45 ms
> >
> > is this normal behaviour or should I create issue?
> >
> > thanks, m.
> >
> 





[1] https://github.com/hlavki/solr-unified-highlighter-test
[2] 
https://github.com/hlavki/solr-unified-highlighter-test/blob/master/src/test/java/com/example/HighlightTest.java#L60
[3] mailto:m...@hlavki.eu


Re: unified highlighter performance in solr 8.5.1

2020-05-26 Thread Michal Hlavac
fine, I'l try to write simple test, thanks

On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote:
> Please create an issue.  I haven't reproduced it yet but it seems unlikely
> to be user-error.
> 
> ~ David
> 
> 
> On Mon, May 25, 2020 at 9:28 AM Michal Hlavac  wrote:
> 
> > Hi,
> >
> > I have field:
> >  > stored="true" indexed="false" storeOffsetsWithPositions="true"/>
> >
> > and configuration:
> > true
> > unified
> > true
> > content_txt_sk_highlight
> > 2
> > true
> >
> > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which
> > is really slow.
> > Same query with hl.bs.type=WORD takes from 8 - 45 ms
> >
> > is this normal behaviour or should I create issue?
> >
> > thanks, m.
> >
> 


Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
Yes, have no problems in 8.4.1, only 8.5.1
Also yes, those are multi page pdf files.

m.

On pondelok 25. mája 2020 19:11:31 CEST David Smiley wrote:
> Wow that's terrible!
> So this problem is for SENTENCE in particular, and it's a regression in
> 8.5?  I'll see if I can reproduce this with the Lucene benchmark module.
> 
> I figure you have some meaty text, like "page" size or longer?
> 
> ~ David
> 
> 
> On Mon, May 25, 2020 at 10:38 AM Michal Hlavac  wrote:
> 
> > I did same test on solr 8.4.1 and response times are same for both
> > hl.bs.type=SENTENCE and hl.bs.type=WORD
> >
> > m.
> >
> > On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote:
> >
> >
> > Hi,
> >
> > I have field:
> >  > stored="true" indexed="false" storeOffsetsWithPositions="true"/>
> >
> > and configuration:
> > true
> > unified
> > true
> > content_txt_sk_highlight
> > 2
> > true
> >
> > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which
> > is really slow.
> > Same query with hl.bs.type=WORD takes from 8 - 45 ms
> >
> > is this normal behaviour or should I create issue?
> >
> > thanks, m.
> >
> >
> >
> 


Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
I did same test on solr 8.4.1 and response times are same for both 
hl.bs.type=SENTENCE and hl.bs.type=WORD

m.

On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote:


Hi,
 
I have field:

 
and configuration:
true
unified
true
content_txt_sk_highlight
2
true
 
Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is 
really slow.
Same query with hl.bs.type=WORD takes from 8 - 45 ms
 
is this normal behaviour or should I create issue?
 
thanks, m. 




unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
Hi,

I have field:


and configuration:
true
unified
true
content_txt_sk_highlight
2
true

Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is 
really slow.
Same query with hl.bs.type=WORD takes from 8 - 45 ms

is this normal behaviour or should I create issue?

thanks, m.


deduplicated suggester

2020-04-09 Thread Michal Hlavac
Hi,

I wrote suggester based on AnalyzingInfixSuggester that deduplicates data on 
defined key pattern.
Source code is on github: https://github.com/hlavki/solr-unique-suggester[1] 

m.


[1] https://github.com/hlavki/solr-unique-suggester


Re: deduplication of suggester results are not enough

2020-03-26 Thread Michal Hlavac
Hi Roland,

I wrote AnalyzingInfixSuggester that deduplicates data on several levels at 
index time.
I will publish it in few days on github. I'll wrote to this thread when done.

m.

On štvrtok 26. marca 2020 16:01:57 CET Szűcs Roland wrote:
> Hi All,
> 
> I follow the discussion of the suggester related discussions quite a while
> ago. Everybody agrees that it is not the expected behaviour from a
> Suggester where the terms are the entities and not the documents to return
> the same string representation several times.
> 
> One suggestion was to make deduplication on client side of Solr. It is very
> easy in most of the client solution as any set based data structure solve
> this.
> 
> *But one important problem is not solved the deduplication: suggest.count*.
> 
> If I have15 matches by the suggester and the suggest.count=10 and the first
> 9 matches are the same, I will get back only 2 after the deduplication and
> the remaining 5 unique terms will be never shown.
> 
> What is the solution for this?
> 
> Cheers,
> Roland
> 


Re: Search Analytics Help

2018-04-27 Thread Michal Hlavac
Hi,

you have plenty options. Without any special effort there is ELK. Parse solr 
logs with logstash, feed elasticsearch with data, then analyze in kibana.

Another option is to send every relevant search request to kafka, then you can 
do more sophisticated data analytic using kafka-stream API. Then use ELK to 
feed elasticsearch with logstash kafka input plugin. For this scenario you need 
to do some programming. I`ve already created this component but I hadn't time 
to publish it.

Another option is use only logstash to feed e.g. graphite database and show 
results with grafana or combine all these options.

You can also monitor SOLR instances by JMX logstash input plugin.

Really don't understand what do you mean by saying that there is nothing 
satisfactory.

m.

On štvrtok, 26. apríla 2018 22:23:30 CEST Doug Turnbull wrote:
> Honestly I haven’t seen anything satisfactory (yet). It’s a huge need in
> the open source community
> 
> On Thu, Apr 26, 2018 at 3:38 PM Ennio Bozzetti 
> wrote:
> 
> > Hello,
> >
> > I'm setting up SOLR on an internal website for my company and I would like
> > to know if anyone can recommend an analytics that I can see what the users
> > are searching for? Does the log in SOLR give me that information?
> >
> > Thank you,
> > Ennio Bozzetti
> >
> > --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug




Re: Reading data from Oracle

2018-02-15 Thread Michal Hlavac
Did you try to use ConcurrentUpdateSolrClient instead of HttpSolrClient?

m.

On štvrtok, 15. februára 2018 8:34:06 CET LOPEZ-CORTES Mariano-ext wrote:
> Hello
> 
> We have to delete our Solr collection and feed it periodically from an Oracle 
> database (up to 40M rows).
> 
> We've done the following test: From a java program, we read chunks of data 
> from Oracle and inject to Solr (via Solrj).
> 
> The problem : It is really really slow (1'5 nights).
> 
> Is there one faster method to do that ?
> 
> Thanks in advance.


Re: search request audit logging

2017-10-06 Thread Michal Hlavac
Hi,
 
I've noticed that in SOLR-7484 Solr part of http request was moved to 
SolrHttpCall. So there is no way to handle
SolrQueryRequest and SolrQueryResponse in SolrDispatchFilter.
 
Internal requet logging is SolrCore.execute(SolrRequestHandler, 
SolrQueryRequest, SolrQueryResponse)
 
Is there is way to handle SOLR request/response to make custom log in SolrCloud 
environment.
 
thank you, m.


Hi,

I would like to ask how to implement search audit logging. I've implemented 
some idea but I would like to ask if there is better approach to do this.

Requirement is to log username, search time, all request parameters (q, fq, 
etc.), response data (count, etc) and important thing is to log all errors.

As I need it only for search requests I implemented custom SearchHandler with 
something like:

public class AuditSearchHandler extends SearchHandler {

@Override
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {
try {
super.handleRequest(req, rsp);
} finally {
doAuditLog(req, rsp);
}
}
}

Custom SearchComponent is not option, because it can't handle all errors.

I read also 
/http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html/[1]
 and they mentioned custom Servlet Filter, but I didn't find example how to 
implement Servlet Filter to SOLR in proper way. If it's ok to edit web.xml

thanks for suggestions, m.





[1] 
http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html


search request audit logging

2017-10-02 Thread Michal Hlavac
Hi,

I would like to ask how to implement search audit logging. I've implemented 
some idea but I would like to ask if there is better approach to do this.

Requirement is to log username, search time, all request parameters (q, fq, 
etc.), response data (count, etc) and important thing is to log all errors.

As I need it only for search requests I implemented custom SearchHandler with 
something like:

public class AuditSearchHandler extends SearchHandler {

@Override
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {
try {
super.handleRequest(req, rsp);
} finally {
doAuditLog(req, rsp);
}
}
}

Custom SearchComponent is not option, because it can't handle all errors.

I read also 
/http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html[1]/
 and they mentioned custom Servlet Filter, but I didn't find example how to 
implement Servlet Filter to SOLR in proper way. If it's ok to edit web.xml

thanks for suggestions, m.



[1] 
http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html


Re: Custom token filter in SolrCloud mode using Blob store

2017-02-03 Thread Michal Hlavac
I get it: Schema components do not yet support the Blob Store.

thanks

On piatok, 3. februára 2017 10:28:27 CET Michal Hlavac wrote:
> Hi,
> 
> it is possible to use BlobStore & Config API with enabled.runtime.lib=true to 
> add custom token filters?
> I tried, but it doesn't work.
> 
> 1. Uploaded jar lucene-analyzers-morfologik-6.4.0.jar file to blob store 
> .system with name lucene-analyzers-morfologik-6.4.0
> 
> 2. Add runtime library {"add-runtimelib": { 
> "name":"lucene-analyzers-morfologik-6.4.0", "version":1 }}
> 
> 3. Create custom field type:
> curl -X POST -H 'Content-type:application/json' --data-binary '{
>   "add-field-type" : {
>  "name":"txt_sk_lemma",
>  "class":"solr.TextField",
>  "positionIncrementGap":"100",
>  "analyzer" : {
> "tokenizer":{ 
>"class":"solr.StandardTokenizerFactory" },
> "filters":[
>{
>"class":"solr.SynonymFilterFactory",
>"synonyms":"synonyms.txt",
>"ignoreCase":true,
>"expand":false
>},
>{
>"class":"solr.StopFilterFactory",
>"ignoreCase":true,
>"words":"lang/stopwords_sk.txt"
>},
>{
>"class":"solr.LowerCaseFilterFactory"
>},
>{
>"class":"solr.KeywordMarkerFilterFactory",
>"protected":"protwords.txt"
>},
>{
>"runtimeLib":true,
>
> "class":"org.apache.lucene.analysis.morfologik.MorfologikFilterFactory",
>"dictionary":"morfologik/stemming/sk/sk.dict"
>}
> ]}}
> }' http://localhost:8983/solr/default/schema
> 
> I get error
> 
> "errorMessages":["Plugin init failure for [schema.xml] fieldType\nPlugin init 
> failure for [schema.xml] analyzer/filter: Error loading class 
> 'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\nError 
> loading class 
> 'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\norg.apache.lucene.analysis.morfologik.MorfologikFilterFactory\n"
> 
> 
> thanks, miso
> 



Custom token filter in SolrCloud mode using Blob store

2017-02-03 Thread Michal Hlavac
Hi,

it is possible to use BlobStore & Config API with enabled.runtime.lib=true to 
add custom token filters?
I tried, but it doesn't work.

1. Uploaded jar lucene-analyzers-morfologik-6.4.0.jar file to blob store 
.system with name lucene-analyzers-morfologik-6.4.0

2. Add runtime library {"add-runtimelib": { 
"name":"lucene-analyzers-morfologik-6.4.0", "version":1 }}

3. Create custom field type:
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
 "name":"txt_sk_lemma",
 "class":"solr.TextField",
 "positionIncrementGap":"100",
 "analyzer" : {
"tokenizer":{ 
   "class":"solr.StandardTokenizerFactory" },
"filters":[
   {
   "class":"solr.SynonymFilterFactory",
   "synonyms":"synonyms.txt",
   "ignoreCase":true,
   "expand":false
   },
   {
   "class":"solr.StopFilterFactory",
   "ignoreCase":true,
   "words":"lang/stopwords_sk.txt"
   },
   {
   "class":"solr.LowerCaseFilterFactory"
   },
   {
   "class":"solr.KeywordMarkerFilterFactory",
   "protected":"protwords.txt"
   },
   {
   "runtimeLib":true,
   
"class":"org.apache.lucene.analysis.morfologik.MorfologikFilterFactory",
   "dictionary":"morfologik/stemming/sk/sk.dict"
   }
]}}
}' http://localhost:8983/solr/default/schema

I get error

"errorMessages":["Plugin init failure for [schema.xml] fieldType\nPlugin init 
failure for [schema.xml] analyzer/filter: Error loading class 
'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\nError loading 
class 
'org.apache.lucene.analysis.morfologik.MorfologikFilterFactory'\norg.apache.lucene.analysis.morfologik.MorfologikFilterFactory\n"


thanks, miso