Re: Solr Logs to ELK / AWS Firestream

2017-08-18 Thread Sebastian Klemke
Hey

On Do, 2017-08-17 at 10:15 -0600, John Bickerstaff wrote:
> I'm trying to get Solr logs into AWS Firestream.
> 
> Not having a lot of luck.
> 
> Does anyone out there have any experience getting Solr logs into an ELK
> stack?  Or, better yet, getting Solr Logs into AWS Firestream?
> 
> We direct logs to SLF4J and use logback as our SLF4j implementation.
> 
> I have a number of issues, but won't go into them here - since they won't
> mean much except in context.  If you have some knowledge here - let me know
> and I'll ask my specific questions.

We're using net.logstash.log4j.JSONEventLayoutV1 output to json logs and
have logstash collect them. The jsonevent-layout dependency and its
transitive dependencies have to be added to system classloader classpath
by putting them to lib/ext folder. Probably gelf appender would also
work, but we'd like to keep a local backup.


Regards,

Sebastian


-- 
Sebastian Klemke
Senior Software Engineer
  
ResearchGate GmbH
Invalidenstr. 115, 10115 Berlin, Germany
  
www.researchgate.net
  
Registered Seat: Hannover, HR B 202837
Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID:
DE258434568
A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754,
San Francisco, CA 94107



Re: Solr LTR with high rerankDocs

2017-08-11 Thread Sebastian Klemke
Hi

On Do, 2017-08-10 at 08:30 -0700, Erick Erickson wrote:
> I have to confess that I know very little about the mechanics of LTR, but
> I can talk a little bit about compression.
> 
> When a stored values is retrieved for a document it is read from the
> *.fdt file which is a compressed, verbatim copy of the field. DocValues
> can bypass this stored data and read directly from the DV format.
> There's a discussion of useDocValuesAsStored in solr/CHANGES.txt.
> 
> The restriction of docValues is that they can only be used for
> primitive types, numerics, strings and the like, specifically _not_
> fields with class="solr.TextField".
> 
> WARNING: I have no real clue whether LTR is built to leverage
> docValues fields. If you add docValues="true" to the relevant
> fields you'll have to re-index completely. In fact I'd use a new
> collection.
> 
> And don't be put off by the fact that the index size on disk will grow
> on disk if you add docValues, the memory is MMapped to OS
> disk space and will actually _reduce_ your JVM requirements.

Yes, DocValues are definitely on our list of things to test.


Regards,

Sebastian


-- 
Sebastian Klemke
Senior Software Engineer
  
ResearchGate GmbH
Invalidenstr. 115, 10115 Berlin, Germany
  
www.researchgate.net
  
Registered Seat: Hannover, HR B 202837
Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568
A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San 
Francisco, CA 94107



Solr LTR with high rerankDocs

2017-08-10 Thread Sebastian Klemke
Hi,

we're currently experimenting with LTR reranking on large rerank
windows (rerankDocs=1000+). On a >500M documents SolrCloud collection,
we were only able to get sub-second response times with
FieldValueFeature. Therefore we created a custom feature extractor that
matches field values with constant strings to substitute simple
SolrFeature usages. Apparently, the response time is now dominated by
loading stored fields, more specifically by uncompressing chunks of
stored field data.

We're now wondering how many documents LTR can rerank in practice and
what the bottlenecks are. Do you guys have any experience using it?


Regards,

Sebastian


-- 
Sebastian Klemke
Senior Software Engineer
  
ResearchGate GmbH
Invalidenstr. 115, 10115 Berlin, Germany
  
www.researchgate.net
  
Registered Seat: Hannover, HR B 202837
Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568
A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San 
Francisco, CA 94107