Re: Solr Logs to ELK / AWS Firestream
Hey On Do, 2017-08-17 at 10:15 -0600, John Bickerstaff wrote: > I'm trying to get Solr logs into AWS Firestream. > > Not having a lot of luck. > > Does anyone out there have any experience getting Solr logs into an ELK > stack? Or, better yet, getting Solr Logs into AWS Firestream? > > We direct logs to SLF4J and use logback as our SLF4j implementation. > > I have a number of issues, but won't go into them here - since they won't > mean much except in context. If you have some knowledge here - let me know > and I'll ask my specific questions. We're using net.logstash.log4j.JSONEventLayoutV1 output to json logs and have logstash collect them. The jsonevent-layout dependency and its transitive dependencies have to be added to system classloader classpath by putting them to lib/ext folder. Probably gelf appender would also work, but we'd like to keep a local backup. Regards, Sebastian -- Sebastian Klemke Senior Software Engineer ResearchGate GmbH Invalidenstr. 115, 10115 Berlin, Germany www.researchgate.net Registered Seat: Hannover, HR B 202837 Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568 A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San Francisco, CA 94107
Re: Solr LTR with high rerankDocs
Hi On Do, 2017-08-10 at 08:30 -0700, Erick Erickson wrote: > I have to confess that I know very little about the mechanics of LTR, but > I can talk a little bit about compression. > > When a stored values is retrieved for a document it is read from the > *.fdt file which is a compressed, verbatim copy of the field. DocValues > can bypass this stored data and read directly from the DV format. > There's a discussion of useDocValuesAsStored in solr/CHANGES.txt. > > The restriction of docValues is that they can only be used for > primitive types, numerics, strings and the like, specifically _not_ > fields with class="solr.TextField". > > WARNING: I have no real clue whether LTR is built to leverage > docValues fields. If you add docValues="true" to the relevant > fields you'll have to re-index completely. In fact I'd use a new > collection. > > And don't be put off by the fact that the index size on disk will grow > on disk if you add docValues, the memory is MMapped to OS > disk space and will actually _reduce_ your JVM requirements. Yes, DocValues are definitely on our list of things to test. Regards, Sebastian -- Sebastian Klemke Senior Software Engineer ResearchGate GmbH Invalidenstr. 115, 10115 Berlin, Germany www.researchgate.net Registered Seat: Hannover, HR B 202837 Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568 A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San Francisco, CA 94107
Solr LTR with high rerankDocs
Hi, we're currently experimenting with LTR reranking on large rerank windows (rerankDocs=1000+). On a >500M documents SolrCloud collection, we were only able to get sub-second response times with FieldValueFeature. Therefore we created a custom feature extractor that matches field values with constant strings to substitute simple SolrFeature usages. Apparently, the response time is now dominated by loading stored fields, more specifically by uncompressing chunks of stored field data. We're now wondering how many documents LTR can rerank in practice and what the bottlenecks are. Do you guys have any experience using it? Regards, Sebastian -- Sebastian Klemke Senior Software Engineer ResearchGate GmbH Invalidenstr. 115, 10115 Berlin, Germany www.researchgate.net Registered Seat: Hannover, HR B 202837 Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568 A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San Francisco, CA 94107