Re: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Hi, It seems a PDFBox issue, I think. ( https://pdfbox.apache.org/2.0/dependencies.html ) Thanks, Yasufumi 2018年11月6日(火) 16:10 Furkan KAMACI : > Hi All, > > I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so > slow. However, I see that error when I check logs: > > o.a.p.c.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging > (JAI) Image I/O Tools are not installed > > Any idea how to fix this? > > Kind Regards, > Furkan KAMACI >
Java Advanced Imaging (JAI) Image I/O Tools are not installed
Hi All, I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so slow. However, I see that error when I check logs: o.a.p.c.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed Any idea how to fix this? Kind Regards, Furkan KAMACI
Re: Setting of TMP in solr.cmd (for Windows) causes invisibility of the Solr to JDK monitoring tools
Hi Erick, thanks for your feedback, so I've created a corresponding issue in here: https://issues.apache.org/jira/browse/SOLR-12776 Hopefully that will suffice :) Regards Petr __ > Od: "Erick Erickson" > Komu: "solr-user" > Datum: 03.09.2018 01:38 > Předmět: Re: Setting of TMP in solr.cmd (for Windows) causes invisibility of > the Solr to JDK monitoring tools > >Hmmm, please raise a JIRA and, if possible, attach a patch that works >for you. Most of us don't have Windows machines readily available >which hobbles testing, so it's very helpful of someone can test in a >real environment. > >Best, >Erick >On Sun, Sep 2, 2018 at 1:47 PM wrote: >> >> Hi, >> >> please notice the following lines added (among others) to "solr.cmd" by >> commit >> https://github.com/apache/lucene-solr/commit/b36c68b16e67ae701cefce052a4fdbaac88fb65c >> for https://issues.apache.org/jira/browse/SOLR-6833 about 4 years ago: >> >> set TMP=!SOLR_HOME:%EXAMPLE_DIR%=! >> IF NOT "%TMP%"=="%SOLR_HOME%" ( >> set "SOLR_LOGS_DIR=%SOLR_HOME%\..\logs" >> set "LOG4J_CONFIG=file:%EXAMPLE_DIR%\resources\log4j.properties" >> ) >> >> Apparently, the new variable "TMP" is just a temporary one, but by >> coincidence, this variable is also important for JVM! As this system >> variable tells where the "hsperfdata_" directory for storing >> applications' monitoring data should be located. And if this is changed, JDK >> tools like JVisualVM and others won't locally see the given Java >> application, because they search in a different default location. Tested >> with Java 8u152 and Solr 6.3.0. >> >> So Solr authors, could you please rename that "TMP" variable to something >> else, or maybe remove it completely (not sure about the latter alternative)? >> Hopefully it is as easy as described above and I haven't overlooked some >> special meaning of that problematic lines... >> >> Best regards >> >> Petr B. > >
Re: Setting of TMP in solr.cmd (for Windows) causes invisibility of the Solr to JDK monitoring tools
Hmmm, please raise a JIRA and, if possible, attach a patch that works for you. Most of us don't have Windows machines readily available which hobbles testing, so it's very helpful of someone can test in a real environment. Best, Erick On Sun, Sep 2, 2018 at 1:47 PM wrote: > > Hi, > > please notice the following lines added (among others) to "solr.cmd" by > commit > https://github.com/apache/lucene-solr/commit/b36c68b16e67ae701cefce052a4fdbaac88fb65c > for https://issues.apache.org/jira/browse/SOLR-6833 about 4 years ago: > > set TMP=!SOLR_HOME:%EXAMPLE_DIR%=! > IF NOT "%TMP%"=="%SOLR_HOME%" ( > set "SOLR_LOGS_DIR=%SOLR_HOME%\..\logs" > set "LOG4J_CONFIG=file:%EXAMPLE_DIR%\resources\log4j.properties" > ) > > Apparently, the new variable "TMP" is just a temporary one, but by > coincidence, this variable is also important for JVM! As this system variable > tells where the "hsperfdata_" directory for storing applications' > monitoring data should be located. And if this is changed, JDK tools like > JVisualVM and others won't locally see the given Java application, because > they search in a different default location. Tested with Java 8u152 and Solr > 6.3.0. > > So Solr authors, could you please rename that "TMP" variable to something > else, or maybe remove it completely (not sure about the latter alternative)? > Hopefully it is as easy as described above and I haven't overlooked some > special meaning of that problematic lines... > > Best regards > > Petr B.
Setting of TMP in solr.cmd (for Windows) causes invisibility of the Solr to JDK monitoring tools
Hi, please notice the following lines added (among others) to "solr.cmd" by commit https://github.com/apache/lucene-solr/commit/b36c68b16e67ae701cefce052a4fdbaac88fb65c for https://issues.apache.org/jira/browse/SOLR-6833 about 4 years ago: set TMP=!SOLR_HOME:%EXAMPLE_DIR%=! IF NOT "%TMP%"=="%SOLR_HOME%" ( set "SOLR_LOGS_DIR=%SOLR_HOME%\..\logs" set "LOG4J_CONFIG=file:%EXAMPLE_DIR%\resources\log4j.properties" ) Apparently, the new variable "TMP" is just a temporary one, but by coincidence, this variable is also important for JVM! As this system variable tells where the "hsperfdata_" directory for storing applications' monitoring data should be located. And if this is changed, JDK tools like JVisualVM and others won't locally see the given Java application, because they search in a different default location. Tested with Java 8u152 and Solr 6.3.0. So Solr authors, could you please rename that "TMP" variable to something else, or maybe remove it completely (not sure about the latter alternative)? Hopefully it is as easy as described above and I haven't overlooked some special meaning of that problematic lines... Best regards Petr B.
Re: integrate solr with preprocessor tools
Hi Sara, I would recommend looking at code of some component that you use currently and start from that - you can extend that class or use it as template for your own. Thanks, Emir On 16.12.2015 09:58, sara hajili wrote: hi Emir,tnx for answering now my question is how i write this class? i must use solr interfaces? i see in above link that i can use solr analyzer.but how i use that? plz say me how i start to write my own analyzer step by step... which interface i can use and change to achieve my goal? tnx On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Sara, You need to wrap your code in tokenizer or token filter https://wiki.apache.org/solr/SolrPlugins If you want to improve existing and believe others can benefit from improvement, you can open ticket and submit patch. Thanks, Emir On 09.12.2015 10:41, sara hajili wrote: hi i wanna to use solr , and language of my documents that i stored in solr is persian. solr doesn't support persian as well as i want.so i find preprocessor tools like a normalization,tockenizer and etc ... i don't want to use solr persian filter like persian tockenizer,i mean i wanna to improve it. now my question is how i can integrate solr with this external preprocessor tools?? tnx -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: integrate solr with preprocessor tools
hi Emir,tnx for answering now my question is how i write this class? i must use solr interfaces? i see in above link that i can use solr analyzer.but how i use that? plz say me how i start to write my own analyzer step by step... which interface i can use and change to achieve my goal? tnx On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Sara, > You need to wrap your code in tokenizer or token filter > https://wiki.apache.org/solr/SolrPlugins > > If you want to improve existing and believe others can benefit from > improvement, you can open ticket and submit patch. > > Thanks, > Emir > > > On 09.12.2015 10:41, sara hajili wrote: > >> hi i wanna to use solr , and language of my documents that i stored in >> solr >> is persian. >> solr doesn't support persian as well as i want.so i find preprocessor >> tools >> like a normalization,tockenizer and etc ... >> i don't want to use solr persian filter like persian tockenizer,i mean i >> wanna to improve it. >> >> now my question is how i can integrate solr with this external >> preprocessor >> tools?? >> >> tnx >> >> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >
Re: integrate solr with preprocessor tools
Hi Sara, You need to wrap your code in tokenizer or token filter https://wiki.apache.org/solr/SolrPlugins If you want to improve existing and believe others can benefit from improvement, you can open ticket and submit patch. Thanks, Emir On 09.12.2015 10:41, sara hajili wrote: hi i wanna to use solr , and language of my documents that i stored in solr is persian. solr doesn't support persian as well as i want.so i find preprocessor tools like a normalization,tockenizer and etc ... i don't want to use solr persian filter like persian tockenizer,i mean i wanna to improve it. now my question is how i can integrate solr with this external preprocessor tools?? tnx -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
integrate solr with preprocessor tools
hi i wanna to use solr , and language of my documents that i stored in solr is persian. solr doesn't support persian as well as i want.so i find preprocessor tools like a normalization,tockenizer and etc ... i don't want to use solr persian filter like persian tockenizer,i mean i wanna to improve it. now my question is how i can integrate solr with this external preprocessor tools?? tnx
Solr Cloud Management Tools
Hello. Can someone suggest SolrCloud Management tool I'm Looking to gather Collection/Docuements/Shares Metrics and also to collect Data about the cluster usage on Mem,ReadWrites etc.. Thanks Elan
Re: Solr Cloud Management Tools
http://sematext.com/spm/ Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Nov 4, 2014 at 3:01 PM, elangovan palani elang...@yahoo.com.invalid wrote: Hello. Can someone suggest SolrCloud Management tool I'm Looking to gather Collection/Docuements/Shares Metrics and also to collect Data about the cluster usage on Mem,ReadWrites etc.. Thanks Elan
Re: Solr Cloud Management Tools
SemaText products are usually a good place to start fine tuning your requirements: http://sematext.com/index.html I believe they do trials as well. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 4 November 2014 15:01, elangovan palani elang...@yahoo.com.invalid wrote: Hello. Can someone suggest SolrCloud Management tool I'm Looking to gather Collection/Docuements/Shares Metrics and also to collect Data about the cluster usage on Mem,ReadWrites etc.. Thanks Elan
Optimal setup for multiple tools
Hello, My team has been working with SOLR for the last 2 years. We have two main indices: 1. documents -index and store main text -one record for each document 2. places (all of the geospatial places found in the documents above) -index but don't store main text -one record for each place. could have thousands in a single document but the ratio has seemed to come out to 6:1 places to documents We have several tools that query the above indices. One is just a standard search tool that returns documents filtered on keyword, temporal, and geospatial filters. Another is a geospatial tool that queries the places collection. We now have a requirement to provide document highlighting when querying in the geospatial tool. Does anyone have any suggestions/prior experience on how they would set up two collections that are essentially different views of the data? Also any tips on how to ensure that these two collections are in sync (meaning any documents indexed into the documents collection are also properly indexed in places)? Thanks alot, Jimmy Lin
Re: Optimal setup for multiple tools
Have you considered putting them in the _same_ index? There's not much penalty at all with having sparsely populated fields in a document, so the fact that the two parts of your index had orthogonal fields wouldn't cost you much and would solve the synchronization problem. You can include a type field to distinguish between the and just include a filter query to keep them separate. Since that'll be cached, your search performance should be fine. Otherwise you should include the fields you need to sort on in the index you need to sort. Denormalizes the data, but... About keeping the two in synch, that's really outside Solr, your indexing process has to manage that I'd guess. Best, Erick On Sat, Apr 26, 2014 at 7:24 AM, Jimmy Lin jimmys.em...@gmail.com wrote: Hello, My team has been working with SOLR for the last 2 years. We have two main indices: 1. documents -index and store main text -one record for each document 2. places (all of the geospatial places found in the documents above) -index but don't store main text -one record for each place. could have thousands in a single document but the ratio has seemed to come out to 6:1 places to documents We have several tools that query the above indices. One is just a standard search tool that returns documents filtered on keyword, temporal, and geospatial filters. Another is a geospatial tool that queries the places collection. We now have a requirement to provide document highlighting when querying in the geospatial tool. Does anyone have any suggestions/prior experience on how they would set up two collections that are essentially different views of the data? Also any tips on how to ensure that these two collections are in sync (meaning any documents indexed into the documents collection are also properly indexed in places)? Thanks alot, Jimmy Lin
Re: Tools for schema.xml generation and to import from a database
Thanks for the reply Alexandre, I will test your clues as soon as possible. Best Regards, On Mon, Jul 30, 2012 at 4:15 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you are just starting with SOLR, you might as well jump to 4.0 Alpha. By the time you finished, it will be the production copy. If you want to index stuff from the database, your first step is probably to use DataImportHandler (DIH). Once you get past the basics, you may want to do a custom code, but start from from DIH for faster results. You will want to modify schema.xml. I started by using DIH example and just adding an extra core at first. This might be easier than building a full directory setup from scratch. You also don't actually need to configure schema too much at the beginning. You can start by using dynamic fields. So, if in DIH, you say that your target field is XYZ_i it is automatically picked by as an integer field by SOLR (due to *_i definition that you do need to have). This will not work for fields you want to do aggregation on (e.g. multiple text fields copied into one for easier search), for multilingual text fields, etc. But it will get you going. Oh, and welcome to SOLR. You will like it. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Jul 29, 2012 at 3:45 PM, Andre Lopes lopes80an...@gmail.com wrote: Hi, I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused about what and how to do next. I will use the Jetty version for now. Two poinst I need to know: 1 - I've 2 views that I would like to import to Solr. I think I must do a schema.xml and then import data to that schema. I'm correct with this one? 2 - About tools to autogenerate the schema.xml, there are any? And about tools to import data to the schema, there are any(I'm using Python)? Please give me some clues. Thanks, Best Regards, André.
Tools for schema.xml generation and to import from a database
Hi, I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused about what and how to do next. I will use the Jetty version for now. Two poinst I need to know: 1 - I've 2 views that I would like to import to Solr. I think I must do a schema.xml and then import data to that schema. I'm correct with this one? 2 - About tools to autogenerate the schema.xml, there are any? And about tools to import data to the schema, there are any(I'm using Python)? Please give me some clues. Thanks, Best Regards, André.
Re: Tools for schema.xml generation and to import from a database
If you are just starting with SOLR, you might as well jump to 4.0 Alpha. By the time you finished, it will be the production copy. If you want to index stuff from the database, your first step is probably to use DataImportHandler (DIH). Once you get past the basics, you may want to do a custom code, but start from from DIH for faster results. You will want to modify schema.xml. I started by using DIH example and just adding an extra core at first. This might be easier than building a full directory setup from scratch. You also don't actually need to configure schema too much at the beginning. You can start by using dynamic fields. So, if in DIH, you say that your target field is XYZ_i it is automatically picked by as an integer field by SOLR (due to *_i definition that you do need to have). This will not work for fields you want to do aggregation on (e.g. multiple text fields copied into one for easier search), for multilingual text fields, etc. But it will get you going. Oh, and welcome to SOLR. You will like it. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Jul 29, 2012 at 3:45 PM, Andre Lopes lopes80an...@gmail.com wrote: Hi, I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused about what and how to do next. I will use the Jetty version for now. Two poinst I need to know: 1 - I've 2 views that I would like to import to Solr. I think I must do a schema.xml and then import data to that schema. I'm correct with this one? 2 - About tools to autogenerate the schema.xml, there are any? And about tools to import data to the schema, there are any(I'm using Python)? Please give me some clues. Thanks, Best Regards, André.
Re: Lexical analysis tools for German language data
On Thu, Apr 12, 2012 at 03:46:56PM +, Michael Ludwig wrote: Von: Walter Underwood German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme) [...] IANAL (I am not a linguist -- pun intended ;) but I've always read that as a genitive. Any pointers? Regards -- Tomás Zerolo Axel Springer AG Axel Springer media Systems BILD Produktionssysteme Axel-Springer-Straße 65 10888 Berlin Tel.: +49 (30) 2591-72875 tomas.zer...@axelspringer.de www.axelspringer.de Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998 Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita Vorstand: Dr. Mathias Döpfner (Vorsitzender) Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele
AW: Lexical analysis tools for German language data
Von: Tomas Zerolo There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme) [...] IANAL (I am not a linguist -- pun intended ;) but I've always read that as a genitive. Any pointers? Admittedly, that's what you'd think, and despite linguistics telling me otherwise I'd maintain there's some truth in it. For this case, however, consider: die Weihnacht declines like die Nacht, so: nom. die Weihnacht gen. der Weihnacht dat. der Weihnacht akk. die Weihnacht As you can see, there's no s to be found anywhere, not even in the genitive. But my gut feeling, like yours, is that this should indicate genitive, and I would make a point of well-argued gut feeling being at least as relevant as formalist analysis. Michael
Lexical analysis tools for German language data
Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael
AW: Lexical analysis tools for German language data
Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary- backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. A simple approach would obviously be a word list and a regular expression. There will, however, be nuts and bolts to take care of. A more sophisticated and tested approach might be known to you. Michael
Re: Lexical analysis tools for German language data
Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one neglected topics to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the de-compounding words efficiently from a broad corpus but I have never seen it (and the experts at DFKI I asked for for also told me they didn't know of one). paul Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit : Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael
Re: Lexical analysis tools for German language data
You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael
Re: Lexical analysis tools for German language data
If you want that query jacke matches a document containing the word windjacke or kinderjacke, you could use a custom update processor. This processor could search the indexed text for words matching the pattern .*jacke and inject the word jacke into an additional field which you can search against. You would need a whole list of possible suffixes, of course. It would slow down the update process but you don't need to split words during search. Best, Valeriy On Thu, Apr 12, 2012 at 12:39 PM, Paul Libbrecht p...@hoplahup.net wrote: Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one neglected topics to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the de-compounding words efficiently from a broad corpus but I have never seen it (and the experts at DFKI I asked for for also told me they didn't know of one). paul Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit : Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael
Re: Lexical analysis tools for German language data
Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes, for which domain? The Google Search result (I wonder if this is politically correct to not have yours ;-)) shows me that there's an amount of job done in this direction (e.g. Gärten to match Garten) but being precise for this question would be more helpful! paul Le 12 avr. 2012 à 12:46, Bernd Fehling a écrit : You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael
Re: Lexical analysis tools for German language data
Paul, nearly two years ago I requested an evaluation license and tested BASIS Tech Rosette for Lucene Solr. Was working excellent but the price much much to high. Yes, they also have compound analysis for several languages including German. Just configure your pipeline in solr and setup the processing pipeline in Rosette Language Processing (RLP) and thats it. Example from my very old schema.xml config: fieldtype name=text_rlp class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=com.basistech.rlp.solr.RLPTokenizerFactory rlpContext=solr/conf/rlp-index-context.xml postPartOfSpeech=false postLemma=true postStem=true postCompoundComponents=true/ filter class=solr.LowerCaseFilterFactory/ filter class=org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=com.basistech.rlp.solr.RLPTokenizerFactory rlpContext=solr/conf/rlp-query-context.xml postPartOfSpeech=false postLemma=true postCompoundComponents=true/ filter class=solr.LowerCaseFilterFactory/ filter class=org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldtype So you just point tokenizer to RLP and have two RLP pipelines configured, one for indexing (rlp-index-context.xml) and one for querying (rlp-query-context.xml). Example form my rlp-index-context.xml config: contextconfig properties property name=com.basistech.rex.optimize value=false/ property name=com.basistech.ela.retokenize_for_rex value=true/ /properties languageprocessors languageprocessorUnicode Converter/languageprocessor languageprocessorLanguage Identifier/languageprocessor languageprocessorEncoding and Character Normalizer/languageprocessor languageprocessorEuropean Language Analyzer/languageprocessor !--languageprocessorScript Region Locator/languageprocessor languageprocessorJapanese Language Analyzer/languageprocessor languageprocessorChinese Language Analyzer/languageprocessor languageprocessorKorean Language Analyzer/languageprocessor languageprocessorSentence Breaker/languageprocessor languageprocessorWord Breaker/languageprocessor languageprocessorArabic Language Analyzer/languageprocessor languageprocessorPersian Language Analyzer/languageprocessor languageprocessorUrdu Language Analyzer/languageprocessor -- languageprocessorStopword Locator/languageprocessor languageprocessorBase Noun Phrase Locator/languageprocessor !--languageprocessorStatistical Entity Extractor/languageprocessor -- languageprocessorExact Match Entity Extractor/languageprocessor languageprocessorPattern Match Entity Extractor/languageprocessor languageprocessorEntity Redactor/languageprocessor languageprocessorREXML Writer/languageprocessor /languageprocessors /contextconfig As you can see I used the European Language Analyzer. Bernd Am 12.04.2012 12:58, schrieb Paul Libbrecht: Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes, for which domain? The Google Search result (I wonder if this is politically correct to not have yours ;-)) shows me that there's an amount of job done in this direction (e.g. Gärten to match Garten) but being precise for this question would be more helpful! paul
AW: Lexical analysis tools for German language data
Von: Valeriy Felberg If you want that query jacke matches a document containing the word windjacke or kinderjacke, you could use a custom update processor. This processor could search the indexed text for words matching the pattern .*jacke and inject the word jacke into an additional field which you can search against. You would need a whole list of possible suffixes, of course. Merci, Valeriy - I agree on the feasability of such an approach. The list would likely have to be composed of the most frequently used terms for your specific domain. In our case, it's things people would buy in shops. Reducing overly complicated and convoluted product descriptions to proper basic terms - that would do the job. It's like going to a restaurant boasting fancy and unintelligible names for the dishes you may order when they are really just ordinary stuff like pork and potatoes. Thinking some more about it, giving sufficient boost to the attached category data might also do the job. That would shift the burden of supplying proper semantics to the guys doing the categorization. It would slow down the update process but you don't need to split words during search. Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit : Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. A query for Windjacke or Kinderjacke would probably not have to be de-specialized to Jacke because, well, that's the user input and users looking for specific things are probably doing so for a reason. If no matches are found you can still tell them to just broaden their search. Michael
Re: Lexical analysis tools for German language data
Hi, We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the dictionary for the HunspellStemFilter. It does introduce a recall/precision problem but it at least returns results for those many users that do not properly use compounds in their search query. There seem to be a small issue with the filter where minSubwordSize=N yields subwords of size N-1. Cheers, On Thursday 12 April 2012 12:39:44 Paul Libbrecht wrote: Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one neglected topics to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the de-compounding words efficiently from a broad corpus but I have never seen it (and the experts at DFKI I asked for for also told me they didn't know of one). paul Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit : Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael -- Markus Jelsma - CTO - Openindex
AW: Lexical analysis tools for German language data
Von: Markus Jelsma We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the dictionary for the HunspellStemFilter. Thank you for pointing me to these two filter classes. It does introduce a recall/precision problem but it at least returns results for those many users that do not properly use compounds in their search query. Could you define what the term recall should be taken to mean in this context? I've also encountered it on the BASIStech website. Okay, I found a definition: http://en.wikipedia.org/wiki/Precision_and_recall Dank je wel! Michael
Re: Lexical analysis tools for German language data
German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). Internal nouns should be recapitalized, like Baum above. Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary. Verbs get more complicated inflections, and might need to be decapitalized, like farhren above. Und so weiter. Note that highlighting gets pretty weird when you are matching only part of a word. Luckily, a lot of compounds are simple, and you could well get a measurable improvement with a very simple algorithm. There isn't anything complicated about compounds like Orgelmusik or Netzwerkbetreuer. The Basis Technology linguistic analyzers aren't cheap or small, but they work well. wunder On Apr 12, 2012, at 3:58 AM, Paul Libbrecht wrote: Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes, for which domain? The Google Search result (I wonder if this is politically correct to not have yours ;-)) shows me that there's an amount of job done in this direction (e.g. Gärten to match Garten) but being precise for this question would be more helpful! paul Le 12 avr. 2012 à 12:46, Bernd Fehling a écrit : You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an analysis requires a dictionary-backed approach, which doesn't have to be perfect at all; a list of the most common 2000 words would probably do the job and fulfil a criterion of reasonable usefulness. Do you know of any implementation techniques or working implementations to do this kind of lexical analysis for German language data? (Or other languages, for that matter?) What are they, where can I find them? I'm sure there is something out (commercial or free) because I've seen lots of engines grokking German and the way it builds words. Failing that, what are the proper terms do refer to these techniques so you can search more successfully? Michael
AW: Lexical analysis tools for German language data
Von: Walter Underwood German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme). But there's not many of them - phrased in a regex, it's /e?[ns]/. The Weinachtsbaum in the example above is from the singular (die Weihnacht), then s, then Baum. Still, it's much more complex then, say, English or Italian. Internal nouns should be recapitalized, like Baum above. Casing won't matter for indexing, I think. The way I would go about obtaining stems from compound words is by using a dictionary of stems and a regex. We'll see how far that'll take us. Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary. Good point. Note that highlighting gets pretty weird when you are matching only part of a word. Guess it'll be a weird when you get it wrong, like Noten in Notentriegelung. Luckily, a lot of compounds are simple, and you could well get a measurable improvement with a very simple algorithm. There isn't anything complicated about compounds like Orgelmusik or Netzwerkbetreuer. Exactly. The Basis Technology linguistic analyzers aren't cheap or small, but they work well. We will consider our needs and options. Thanks for your thoughts. Michael
Re: AW: Lexical analysis tools for German language data
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary. Good point. More or less, Fahrrad is generally abbreviated as Rad. (even though Rad can mean wheel and bike) Note that highlighting gets pretty weird when you are matching only part of a word. Guess it'll be a weird when you get it wrong, like Noten in Notentriegelung. This decomposition should not happen because Noten-triegelung does not have a correct second term. The Basis Technology linguistic analyzers aren't cheap or small, but they work well. We will consider our needs and options. Thanks for your thoughts. My question remains as to which domain it aims at covering. We had such need for mathematics texts... I would be pleasantly surprised if, for example, Differenzen-quotient would be decompounded. paul
Re: AW: Lexical analysis tools for German language data
On Apr 12, 2012, at 8:46 AM, Michael Ludwig wrote: I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme). That is some excellent linguistic jargon. I'll file that with hapax legomenon. If you don't highlight, you can get good results with pretty rough analyzers, but highlighting exposes those, even when they don't affect relevance. For example, you can get good relevance just indexing bigrams in Chinese, but it looks awful when you highlight them. As soon as you highlight, you need a dictionary-based segmenter. wunder -- Walter Underwood wun...@wunderwood.org
Re: AW: Lexical analysis tools for German language data
On Thursday 12 April 2012 18:00:14 Paul Libbrecht wrote: Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary. Good point. More or less, Fahrrad is generally abbreviated as Rad. (even though Rad can mean wheel and bike) Note that highlighting gets pretty weird when you are matching only part of a word. Guess it'll be a weird when you get it wrong, like Noten in Notentriegelung. This decomposition should not happen because Noten-triegelung does not have a correct second term. The Basis Technology linguistic analyzers aren't cheap or small, but they work well. We will consider our needs and options. Thanks for your thoughts. My question remains as to which domain it aims at covering. We had such need for mathematics texts... I would be pleasantly surprised if, for example, Differenzen-quotient would be decompounded. The HyphenationCompoundWordTokenFilter can do those things but those words must be listed in the dictionary or you'll get strange results. It still yields strange results when it emits tokens that are subwords of a subword. paul -- Markus Jelsma - CTO - Openindex
Re: AW: Lexical analysis tools for German language data
On Apr 12, 2012, at 9:00 AM, Paul Libbrecht wrote: More or less, Fahrrad is generally abbreviated as Rad. (even though Rad can mean wheel and bike) A synonym could handle this, since farhren would not be a good match. It is judgement call, but this seems more like an equivalence Fahrrad = Rad than decompounding. wunder -- Walter Underwood wun...@wunderwood.org
Re: Reporting tools
On 9 March 2012 09:05, Donald Organ dor...@donaldorgan.com wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? Do not have direct experience of any Solr reporting tool, but please see the Solr StatsComponent: http://wiki.apache.org/solr/StatsComponent This should provide you with data on the Solr index. Regards, Gora
Re: Reporting tools
as Gora says there is the stats component you can take advantage of or you could also use JMX directly [1] or LucidGaze [2][3] or commercial services like [4] or [5] (these are the ones I know but there may be also others), each of them with different level/type of service. Tommaso [1] : http://wiki.apache.org/solr/SolrJmx [2] : http://www.lucidimagination.com/blog/2009/08/24/lucid-gaze-for-lucene/ [3] : http://www.chrisumbel.com/article/monitoring_solr_lucidgaze [4] : http://sematext.com/search-analytics/index.html [5] : http://newrelic.com/ 2012/3/9 Donald Organ dor...@donaldorgan.com Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc?
Re: Reporting tools
Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You might be interested in this : http://www.sematext.com/search-analytics/index.html
Re: Reporting tools
(12/03/09 12:35), Donald Organ wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You may be interested in: Free Query Log Visualizer for Apache Solr http://soleami.com/ koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Reporting tools
Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc?
Re: cache monitoring tools?
Thanks Justin. The reason I decided to ask that is how easy is it to bootstrap a system like Munin. This of course depends on how fast one needs it. That is, if SOLR already exposes certain stat via jxm accessible beans, that will make it easier and faster to set up a tool that can read from jmx. Only my opinion. Thanks, Dmitry On Fri, Dec 16, 2011 at 4:55 AM, Justin Caratzas justin.carat...@gmail.comwrote: Dmitry, Thats beyond the scope of this thread, but Munin essentially runs plugins which are essentially scripts that output graph configuration and values when polled by the Munin server. So it uses a plain text protocol, so that the scripts can be written in any language. Munin then feeds this info into RRDtool, which displays the graph. There are some examples[1] of solr plugins that people have used to scrape the stats.jsp page. Justin 1. http://exchange.munin-monitoring.org/plugins/search?keyword=solr Dmitry Kan dmitry@gmail.com writes: Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how about munin, what protocol / way it uses to accumulate stats? It wasn't obvious from their online documentation... On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas justin.carat...@gmail.comwrote: Dmitry, The only added stress that munin puts on each box is the 1 request per stat per 5 minutes to our admin stats handler. Given that we get 25 requests per second, this doesn't make much of a difference. We don'tg have a sharded index (yet) as our index is only 2-3 GB, but we do have slave servers with replicated indexes that handle the queries, while our master handles updates/commits. Justin Dmitry Kan dmitry@gmail.com writes: Justin, in terms of the overhead, have you noticed if Munin puts much of it when used in production? In terms of the solr farm: how big is a shard's index (given you have sharded architecture). Dmitry On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas justin.carat...@gmail.comwrote: At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10
Re: cache monitoring tools?
Dmitry, Thats beyond the scope of this thread, but Munin essentially runs plugins which are essentially scripts that output graph configuration and values when polled by the Munin server. So it uses a plain text protocol, so that the scripts can be written in any language. Munin then feeds this info into RRDtool, which displays the graph. There are some examples[1] of solr plugins that people have used to scrape the stats.jsp page. Justin 1. http://exchange.munin-monitoring.org/plugins/search?keyword=solr Dmitry Kan dmitry@gmail.com writes: Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how about munin, what protocol / way it uses to accumulate stats? It wasn't obvious from their online documentation... On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas justin.carat...@gmail.comwrote: Dmitry, The only added stress that munin puts on each box is the 1 request per stat per 5 minutes to our admin stats handler. Given that we get 25 requests per second, this doesn't make much of a difference. We don'tg have a sharded index (yet) as our index is only 2-3 GB, but we do have slave servers with replicated indexes that handle the queries, while our master handles updates/commits. Justin Dmitry Kan dmitry@gmail.com writes: Justin, in terms of the overhead, have you noticed if Munin puts much of it when used in production? In terms of the solr farm: how big is a shard's index (given you have sharded architecture). Dmitry On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas justin.carat...@gmail.comwrote: At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class
Re: cache monitoring tools?
Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how about munin, what protocol / way it uses to accumulate stats? It wasn't obvious from their online documentation... On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas justin.carat...@gmail.comwrote: Dmitry, The only added stress that munin puts on each box is the 1 request per stat per 5 minutes to our admin stats handler. Given that we get 25 requests per second, this doesn't make much of a difference. We don't have a sharded index (yet) as our index is only 2-3 GB, but we do have slave servers with replicated indexes that handle the queries, while our master handles updates/commits. Justin Dmitry Kan dmitry@gmail.com writes: Justin, in terms of the overhead, have you noticed if Munin puts much of it when used in production? In terms of the solr farm: how big is a shard's index (given you have sharded architecture). Dmitry On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas justin.carat...@gmail.comwrote: At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time
Re: cache monitoring tools?
Hoss, I can't see why Network IO is the issue as the shards and the front end SOLR resided on the same server. I said resided, because I got rid of the front end (which according to my measurements, was taking at least as much time for merging as it took to find the actual data in the shards) and shards. Now I have only one shard having all the data. Filter cache tuning also helped to reduce the amount of evictions to a minimum. Dmitry On Fri, Dec 9, 2011 at 10:42 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : The culprit seems to be the merger (frontend) SOLR. Talking to one shard : directly takes substantially less time (1-2 sec). ... :facet.limit=50 Your probably most likeley has very little to do with your caches at all -- a facet.limit that high requires sending a very large amount of data over the wire, multiplied by the number of shards, multipled by some constant (i think it's 2 but it might be higher) in order to over request facet constriant counts from each shard to aggregate them. the dominant factor in the slow speed you are seeing is most likeley Network IO between the shards. -Hoss -- Regards, Dmitry Kan
Re: cache monitoring tools?
Paul, have you checked solrmeter and zabbix? Dmitry On Fri, Dec 9, 2011 at 11:16 PM, Paul Libbrecht p...@hoplahup.net wrote: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: cache monitoring tools?
Justin, in terms of the overhead, have you noticed if Munin puts much of it when used in production? In terms of the solr farm: how big is a shard's index (given you have sharded architecture). Dmitry On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas justin.carat...@gmail.comwrote: At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive
Re: cache monitoring tools?
Dmitry, The only added stress that munin puts on each box is the 1 request per stat per 5 minutes to our admin stats handler. Given that we get 25 requests per second, this doesn't make much of a difference. We don't have a sharded index (yet) as our index is only 2-3 GB, but we do have slave servers with replicated indexes that handle the queries, while our master handles updates/commits. Justin Dmitry Kan dmitry@gmail.com writes: Justin, in terms of the overhead, have you noticed if Munin puts much of it when used in production? In terms of the solr farm: how big is a shard's index (given you have sharded architecture). Dmitry On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas justin.carat...@gmail.comwrote: At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM
Re: cache monitoring tools?
At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: cache monitoring tools?
Justin, I am not sure this answers the question: is there a graph view (of some measurements) which can be synched to one or several logs? I'd like to click on a spike of CPU to see the log around the time of that spike. Does munin or any other do that? paul Le 11 déc. 2011 à 17:39, Justin Caratzas a écrit : At my work, we use Munin and Nagio for monitoring and alerts. Munin is great because writing a plugin for it so simple, and with Solr's statistics handler, we can track almost any solr stat we want. It also comes with included plugins for load, file system stats, processes, etc. http://munin-monitoring.org/ Justin Paul Libbrecht p...@hoplahup.net writes: Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards
Re: cache monitoring tools?
: The culprit seems to be the merger (frontend) SOLR. Talking to one shard : directly takes substantially less time (1-2 sec). ... :facet.limit=50 Your probably most likeley has very little to do with your caches at all -- a facet.limit that high requires sending a very large amount of data over the wire, multiplied by the number of shards, multipled by some constant (i think it's 2 but it might be higher) in order to over request facet constriant counts from each shard to aggregate them. the dominant factor in the slow speed you are seeing is most likeley Network IO between the shards. -Hoss
Re: cache monitoring tools?
Allow me to chim in and ask a generic question about monitoring tools for people close to developers: are any of the tools mentioned in this thread actually able to show graphs of loads, e.g. cache counts or CPU load, in parallel to a console log or to an http request log?? I am working on such a tool currently but I have a bad feeling of reinventing the wheel. thanks in advance Paul Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit : Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: cache monitoring tools?
Hi Otis, I can't find the download for the free SPM. What Hardware and OS do I need for installing SPM to monitor my servers? Regards Bernd Am 07.12.2011 18:47, schrieb Otis Gospodnetic: Hi Dmitry, You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system info, etc.) PLUS it's currently 100% free. http://sematext.com/spm/solr-performance-monitoring/index.html We use it with our clients on a regular basis and it helps us a TON - we just helped a very popular mobile app company improve Solr performance by a few orders of magnitude (including filter tuning) with the help of SPM. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Dmitry Kandmitry@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, December 7, 2011 2:13 AM Subject: cache monitoring tools? Hello list, We've noticed quite huge strain on the filterCache in facet queries against trigram fields (see schema in the end of this e-mail). The typical query contains some keywords in the q parameter and boolean filter query on other solr fields. It is also facet query, the facet field is of type shingle_text_trigram (see schema) and facet.limit=50. Questions: are there some tools (except for solrmeter) and/or approaches to monitor / profile the load on caches, which would help to derive better tuning parameters? Can you recommend checking config parameters of other components but caches? BTW, this has become much faster compared to solr 1.4 where we had to a lot of optimizations on schema level (e.g. by making a number of stored fields non-stored) Here are the relevant stats from admin (SOLR 3.4): description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 93 hits : 90 hitratio : 0.96 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 93 cumulative_hits : 90 cumulative_hitratio : 0.96 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, acceptableSize=486, cleanupThread=false) stats: lookups : 1003486 hits : 2809 hitratio : 0.00 inserts : 1000694 evictions : 1000221 size : 473 warmupTime : 0 cumulative_lookups : 1003486 cumulative_hits : 2809 cumulative_hitratio : 0.00 cumulative_inserts : 1000694 cumulative_evictions : 1000221 schema excerpt: fieldType name=shingle_text_trigram class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true/ /analyzer /fieldType -- Regards, Dmitry Kan
Re: cache monitoring tools?
Hi Bernd, Check this: SPM for Solr is the enterprise-class, cloud-based, System/OS and Solr Performance Monitoring SaaS. So it's a SaaS - you simply sign up for it. During the signup you'll get to download a small agent that works on RedHat, CentOS, Debian, Ubuntu, and maybe other OSes. If you have any more SPM questions, it may be best to email me directly. For example, if you are only interested in SPM if it runs in your datacenter, please let me know. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Bernd Fehling bernd.fehl...@uni-bielefeld.de To: solr-user@lucene.apache.org Sent: Thursday, December 8, 2011 4:04 AM Subject: Re: cache monitoring tools? Hi Otis, I can't find the download for the free SPM. What Hardware and OS do I need for installing SPM to monitor my servers? Regards Bernd Am 07.12.2011 18:47, schrieb Otis Gospodnetic: Hi Dmitry, You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system info, etc.) PLUS it's currently 100% free. http://sematext.com/spm/solr-performance-monitoring/index.html We use it with our clients on a regular basis and it helps us a TON - we just helped a very popular mobile app company improve Solr performance by a few orders of magnitude (including filter tuning) with the help of SPM. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Dmitry Kandmitry@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, December 7, 2011 2:13 AM Subject: cache monitoring tools? Hello list, We've noticed quite huge strain on the filterCache in facet queries against trigram fields (see schema in the end of this e-mail). The typical query contains some keywords in the q parameter and boolean filter query on other solr fields. It is also facet query, the facet field is of type shingle_text_trigram (see schema) and facet.limit=50. Questions: are there some tools (except for solrmeter) and/or approaches to monitor / profile the load on caches, which would help to derive better tuning parameters? Can you recommend checking config parameters of other components but caches? BTW, this has become much faster compared to solr 1.4 where we had to a lot of optimizations on schema level (e.g. by making a number of stored fields non-stored) Here are the relevant stats from admin (SOLR 3.4): description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 93 hits : 90 hitratio : 0.96 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 93 cumulative_hits : 90 cumulative_hitratio : 0.96 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, acceptableSize=486, cleanupThread=false) stats: lookups : 1003486 hits : 2809 hitratio : 0.00 inserts : 1000694 evictions : 1000221 size : 473 warmupTime : 0 cumulative_lookups : 1003486 cumulative_hits : 2809 cumulative_hitratio : 0.00 cumulative_inserts : 1000694 cumulative_evictions : 1000221 schema excerpt: fieldType name=shingle_text_trigram class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true/ /analyzer /fieldType -- Regards, Dmitry Kan
Re: cache monitoring tools?
facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: cache monitoring tools?
Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
Re: cache monitoring tools?
Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
Re: cache monitoring tools?
Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: cache monitoring tools?
The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: cache monitoring tools?
Hi Dmitry, You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system info, etc.) PLUS it's currently 100% free. http://sematext.com/spm/solr-performance-monitoring/index.html We use it with our clients on a regular basis and it helps us a TON - we just helped a very popular mobile app company improve Solr performance by a few orders of magnitude (including filter tuning) with the help of SPM. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, December 7, 2011 2:13 AM Subject: cache monitoring tools? Hello list, We've noticed quite huge strain on the filterCache in facet queries against trigram fields (see schema in the end of this e-mail). The typical query contains some keywords in the q parameter and boolean filter query on other solr fields. It is also facet query, the facet field is of type shingle_text_trigram (see schema) and facet.limit=50. Questions: are there some tools (except for solrmeter) and/or approaches to monitor / profile the load on caches, which would help to derive better tuning parameters? Can you recommend checking config parameters of other components but caches? BTW, this has become much faster compared to solr 1.4 where we had to a lot of optimizations on schema level (e.g. by making a number of stored fields non-stored) Here are the relevant stats from admin (SOLR 3.4): description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 93 hits : 90 hitratio : 0.96 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 93 cumulative_hits : 90 cumulative_hitratio : 0.96 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, acceptableSize=486, cleanupThread=false) stats: lookups : 1003486 hits : 2809 hitratio : 0.00 inserts : 1000694 evictions : 1000221 size : 473 warmupTime : 0 cumulative_lookups : 1003486 cumulative_hits : 2809 cumulative_hitratio : 0.00 cumulative_inserts : 1000694 cumulative_evictions : 1000221 schema excerpt: fieldType name=shingle_text_trigram class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true/ /analyzer /fieldType -- Regards, Dmitry Kan
Re: cache monitoring tools?
Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: cache monitoring tools?
Otis, Tomás: thanks for the great links! 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any tool that visualizes JMX stuff like Zabbix. See http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/ On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote: The culprit seems to be the merger (frontend) SOLR. Talking to one shard directly takes substantially less time (1-2 sec). On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote: Tomás: thanks. The page you gave didn't mention cache specifically, is there more documentation on this specifically? I have used solrmeter tool, it draws the cache diagrams, is there a similar tool, but which would use jmx directly and present the cache usage in runtime? pravesh: I have increased the size of filterCache, but the search hasn't become any faster, taking almost 9 sec on avg :( name: search class: org.apache.solr.handler.component.SearchHandler version: $Revision: 1052938 $ description: Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1323255147351 requests : 100 errors : 3 timeouts : 0 totalTime : 885438 avgTimePerRequest : 8854.38 avgRequestsPerSecond : 0.008789442 the stats (copying fieldValueCache as well here, to show term statistics): name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 79 hits : 77 hitratio : 0.97 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 79 cumulative_hits : 77 cumulative_hitratio : 0.97 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096, minSize=138240, acceptableSize=145920, cleanupThread=false) stats: lookups : 1082854 hits : 940370 hitratio : 0.86 inserts : 142486 evictions : 0 size : 142486 warmupTime : 0 cumulative_lookups : 1082854 cumulative_hits : 940370 cumulative_hitratio : 0.86 cumulative_inserts : 142486 cumulative_evictions : 0 index size: 3,25 GB Does anyone have some pointers to where to look at and optimize for query time? 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Dimitry, cache information is exposed via JMX, so you should be able to monitor that information with any JMX tool. See http://wiki.apache.org/solr/SolrJmx On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote: Yes, we do require that much. Ok, thanks, I will try increasing the maxsize. On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote: facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
cache monitoring tools?
Hello list, We've noticed quite huge strain on the filterCache in facet queries against trigram fields (see schema in the end of this e-mail). The typical query contains some keywords in the q parameter and boolean filter query on other solr fields. It is also facet query, the facet field is of type shingle_text_trigram (see schema) and facet.limit=50. Questions: are there some tools (except for solrmeter) and/or approaches to monitor / profile the load on caches, which would help to derive better tuning parameters? Can you recommend checking config parameters of other components but caches? BTW, this has become much faster compared to solr 1.4 where we had to a lot of optimizations on schema level (e.g. by making a number of stored fields non-stored) Here are the relevant stats from admin (SOLR 3.4): description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 93 hits : 90 hitratio : 0.96 inserts : 1 evictions : 0 size : 1 warmupTime : 0 cumulative_lookups : 93 cumulative_hits : 90 cumulative_hitratio : 0.96 cumulative_inserts : 1 cumulative_evictions : 0 item_shingleContent_trigram : {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91} name: filterCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, acceptableSize=486, cleanupThread=false) stats: lookups : 1003486 hits : 2809 hitratio : 0.00 inserts : 1000694 evictions : 1000221 size : 473 warmupTime : 0 cumulative_lookups : 1003486 cumulative_hits : 2809 cumulative_hitratio : 0.00 cumulative_inserts : 1000694 cumulative_evictions : 1000221 schema excerpt: fieldType name=shingle_text_trigram class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true/ /analyzer /fieldType -- Regards, Dmitry Kan
Tools?
Hello, Are there any tools that can be used for analyzing the solr logs? Regards Sujatha
Re: velocity tools in solr-contrib-velocity?
On 1/29/11, Paul Libbrecht p...@hoplahup.net wrote: Hello list, can anyone tell me how I can plug the velocity tools into my solr? [...] Not sure what you mean by plugging in the tools. There is http://wiki.apache.org/solr/VelocityResponseWriter , but I imagine that you have already seen that. Regards, Gora
velocity tools in solr-contrib-velocity?
Hello list, can anyone tell me how I can plug the velocity tools into my solr? Do I understand correctly the following comment in the source: // Velocity context tools - TODO: make these pluggable that it's only hard-coded thus far? thanks in advance paul
Re: benchmarking tools
Great suggestion, I took a look and it seems pretty useful. As a follow up question, did you do anything to disable Solr caching for certain tests? -mike On Tue, Oct 27, 2009 at 8:14 PM, Joshua Tuberville joshuatubervi...@eharmony.com wrote: Mike, For response times I would also look at java.net's Faban benchmarking framework. We use it extensively for our acceptance tests and tuning excercises. Joshua On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote: I've been making modifications here and there to the Solr source code in hopes to optimize for my particular setup. My goal now is to establish a descent benchmark toolset so that I can evaluate the observed performance increase before deciding rolling out. So far I've investigated Jmeter and Lucid Gaze, but each seem to have pretty steep learning curves, so I thought I'd ping the group before I sink a good chunk of time into either. My ideal performance metrics aren't so much load testing, but rather response time testing for different query types across different Solr configurations. If anybody has some insight into this kind of project I'd love to get some feedback. Thanks in advance, Mike Anderson
benchmarking tools
I've been making modifications here and there to the Solr source code in hopes to optimize for my particular setup. My goal now is to establish a descent benchmark toolset so that I can evaluate the observed performance increase before deciding rolling out. So far I've investigated Jmeter and Lucid Gaze, but each seem to have pretty steep learning curves, so I thought I'd ping the group before I sink a good chunk of time into either. My ideal performance metrics aren't so much load testing, but rather response time testing for different query types across different Solr configurations. If anybody has some insight into this kind of project I'd love to get some feedback. Thanks in advance, Mike Anderson
Re: benchmarking tools
Mike, For response times I would also look at java.net's Faban benchmarking framework. We use it extensively for our acceptance tests and tuning excercises. Joshua On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote: I've been making modifications here and there to the Solr source code in hopes to optimize for my particular setup. My goal now is to establish a descent benchmark toolset so that I can evaluate the observed performance increase before deciding rolling out. So far I've investigated Jmeter and Lucid Gaze, but each seem to have pretty steep learning curves, so I thought I'd ping the group before I sink a good chunk of time into either. My ideal performance metrics aren't so much load testing, but rather response time testing for different query types across different Solr configurations. If anybody has some insight into this kind of project I'd love to get some feedback. Thanks in advance, Mike Anderson
Re: Tools for Managing Synonyms, Elevate, etc.
Mark, Use GUI (may be custom build one) to read files which are present on Solr server. These files can be read using webservice/RMI call. Do all manipulation on synonyms.txt contents and then call webservice/RMI call to save that information. After saving information , just call RELOAD. Check ::http://wiki.apache.org/solr/CoreAdmin#head-3f125034c6a64611779442539812067b8b430930 http://localhost:8983/solr/admin/cores?action=RELOADcore=core0 Hope this helps ~Vikrant Cohen, Mark - ISamp;T wrote: I'm considering building some tools for our internal non-technical staff to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt so software developers don't have to maintain them. Before my team starts building these tools, has anyone done this before? If so, are these tools available as open source? Thanks, Mark Cohen -- View this message in context: http://www.nabble.com/Tools-for-Managing-Synonyms%2C-Elevate%2C-etc.-tp21696372p21796832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tools for Managing Synonyms, Elevate, etc.
Mark, I am not aware of anyone open-sourcing such tools. But note that changing the files with a GUI is easy (editor + scp?). What makes things more complicated is the need to make Solr reload those files and, in some cases, changes really require a full index rebuilding. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Cohen, Mark - IST mark.co...@mtvn.com To: solr-user@lucene.apache.org Sent: Tuesday, January 27, 2009 5:55:46 PM Subject: Tools for Managing Synonyms, Elevate, etc. I'm considering building some tools for our internal non-technical staff to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt so software developers don't have to maintain them. Before my team starts building these tools, has anyone done this before? If so, are these tools available as open source? Thanks, Mark Cohen
Tools for Managing Synonyms, Elevate, etc.
I'm considering building some tools for our internal non-technical staff to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt so software developers don't have to maintain them. Before my team starts building these tools, has anyone done this before? If so, are these tools available as open source? Thanks, Mark Cohen
Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main
Thanks Toby. Aliter: Under contrib/javascript/build.xml - dist target - I removed the dependency on 'docs' , to circumvent the problem. But may be - it would be great to get js.jar from the rhino library distributed ( if not for license contradictions) to circumvent this. Toby Cole wrote: I came across this too earlier, I just deleted the contrib/javascript directory. Of course, if you need javascript library then you'll have to get it building. Sorry, probably not that helpful. :) Toby. On 17 Dec 2008, at 17:03, Kay Kay wrote: I downloaded the latest .tgz and ran $ ant dist docs: [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/javascript/dist/doc [java] Exception in thread main java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) [java] Caused by: java.lang.ClassNotFoundException: org.mozilla.javascript.tools.shell.Main [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [java] at java.security.AccessController.doPrivileged(Native Method) [java] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307) [java] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252) [java] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) [java] ... 1 more BUILD FAILED /opt/src/apache-solr-nightly/common-build.xml:335: The following error occurred while executing this line: /opt/src/apache-solr-nightly/common-build.xml:212: The following error occurred while executing this line: /opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 and came across the above mentioned error. The class seems to be from the rhino (mozilla js ) library. Is it supposed to be packaged by default / is there a license restriction that prevents from being so . Toby Cole Software Engineer Semantico Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE T: +44 (0)1273 358 238 F: +44 (0)1273 723 232 E: toby.c...@semantico.com W: www.semantico.com
Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main
I'm using Java 6 and it's compiling for me. I'm doing.. ant clean ant dist and it works just fine. Maybe try an 'ant clean'? Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Dec 17, 2008, at 9:17 AM, Toby Cole wrote: I came across this too earlier, I just deleted the contrib/ javascript directory. Of course, if you need javascript library then you'll have to get it building. Sorry, probably not that helpful. :) Toby. On 17 Dec 2008, at 17:03, Kay Kay wrote: I downloaded the latest .tgz and ran $ ant dist docs: [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/ javascript/dist/doc [java] Exception in thread main java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) [java] Caused by: java.lang.ClassNotFoundException: org.mozilla.javascript.tools.shell.Main [java] at java.net.URLClassLoader$1.run(URLClassLoader.java: 200) [java] at java.security.AccessController.doPrivileged(Native Method) [java] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307) [java] at sun.misc.Launcher $AppClassLoader.loadClass(Launcher.java:301) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252) [java] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) [java] ... 1 more BUILD FAILED /opt/src/apache-solr-nightly/common-build.xml:335: The following error occurred while executing this line: /opt/src/apache-solr-nightly/common-build.xml:212: The following error occurred while executing this line: /opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 and came across the above mentioned error. The class seems to be from the rhino (mozilla js ) library. Is it supposed to be packaged by default / is there a license restriction that prevents from being so . Toby Cole Software Engineer Semantico Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE T: +44 (0)1273 358 238 F: +44 (0)1273 723 232 E: toby.c...@semantico.com W: www.semantico.com
Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main
I downloaded the latest .tgz and ran $ ant dist docs: [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/javascript/dist/doc [java] Exception in thread main java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) [java] Caused by: java.lang.ClassNotFoundException: org.mozilla.javascript.tools.shell.Main [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [java] at java.security.AccessController.doPrivileged(Native Method) [java] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307) [java] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252) [java] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) [java] ... 1 more BUILD FAILED /opt/src/apache-solr-nightly/common-build.xml:335: The following error occurred while executing this line: /opt/src/apache-solr-nightly/common-build.xml:212: The following error occurred while executing this line: /opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 and came across the above mentioned error. The class seems to be from the rhino (mozilla js ) library. Is it supposed to be packaged by default / is there a license restriction that prevents from being so .
Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main
I came across this too earlier, I just deleted the contrib/javascript directory. Of course, if you need javascript library then you'll have to get it building. Sorry, probably not that helpful. :) Toby. On 17 Dec 2008, at 17:03, Kay Kay wrote: I downloaded the latest .tgz and ran $ ant dist docs: [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/ javascript/dist/doc [java] Exception in thread main java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) [java] Caused by: java.lang.ClassNotFoundException: org.mozilla.javascript.tools.shell.Main [java] at java.net.URLClassLoader$1.run(URLClassLoader.java: 200) [java] at java.security.AccessController.doPrivileged(Native Method) [java] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307) [java] at sun.misc.Launcher $AppClassLoader.loadClass(Launcher.java:301) [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252) [java] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) [java] ... 1 more BUILD FAILED /opt/src/apache-solr-nightly/common-build.xml:335: The following error occurred while executing this line: /opt/src/apache-solr-nightly/common-build.xml:212: The following error occurred while executing this line: /opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 and came across the above mentioned error. The class seems to be from the rhino (mozilla js ) library. Is it supposed to be packaged by default / is there a license restriction that prevents from being so . Toby Cole Software Engineer Semantico Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE T: +44 (0)1273 358 238 F: +44 (0)1273 723 232 E: toby.c...@semantico.com W: www.semantico.com
Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main
On Wed, Dec 17, 2008 at 10:53 PM, Matthew Runo mr...@zappos.com wrote: I'm using Java 6 and it's compiling for me. I believe rhino is included by default in Java 6 -- Regards, Shalin Shekhar Mangar.
Re: Diagnostic tools
On Tue, Aug 5, 2008 at 12:43 PM, Kashyap, Raghu [EMAIL PROTECTED] wrote: Are there are tools that are available to view the indexing process? We have a cron process which posts XML files to the solr index server. However, we are NOT seeing the documents posted correctly and we are also NOT getting any errors from the client You need to send a commit before index changes become visible. -Yonik
RE: Diagnostic tools
Yes we are sending the commits. -Raghu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, August 05, 2008 12:01 PM To: solr-user@lucene.apache.org Subject: Re: Diagnostic tools On Tue, Aug 5, 2008 at 12:43 PM, Kashyap, Raghu [EMAIL PROTECTED] wrote: Are there are tools that are available to view the indexing process? We have a cron process which posts XML files to the solr index server. However, we are NOT seeing the documents posted correctly and we are also NOT getting any errors from the client You need to send a commit before index changes become visible. -Yonik
Re: Diagnostic tools
On Tue, 5 Aug 2008 11:43:44 -0500 Kashyap, Raghu [EMAIL PROTECTED] wrote: Hi, Hi Kashyap, please don't hijack topic threads. http://en.wikipedia.org/wiki/Thread_hijacking thanks!! B _ {Beto|Norberto|Numard} Meijome Software QA is like cleaning my cat's litter box: Sift out the big chunks. Stir in the rest. Hope it doesn't stink. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
RE: Benchmarking tools?
Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: Benchmarking tools?
Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: Benchmarking tools?
Hi, I basically followed this: http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae I basically put all my queries in a flat text file. you could either use two parameters or put them in one file. The good point of this is, that each test uses the same queries, so you can compare the settings better afterwards. If you use varying facets, you might just go with 2 text files. If it stays the same in one test you can hardcode it into the test case. I polished the result a little, if you want to take a look: http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such nice graphs. (green is the max results delivered, upon 66 active users per second the response time increases (orange/yellow, average and median of the response times) (i know the scales and descriptions are missing :-) but you should get the picture) I manually reduced the machines capacity, elsewise solr would server more than 12000 requests per second. (the whole index did fit into ram) I can send you my saved test case if this would help you. Nico Jacob Singh wrote: Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: Benchmarking tools?
nice stuff. Please send me the test case, I'd love to see it. Thanks, Jacob Nico Heid wrote: Hi, I basically followed this: http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae I basically put all my queries in a flat text file. you could either use two parameters or put them in one file. The good point of this is, that each test uses the same queries, so you can compare the settings better afterwards. If you use varying facets, you might just go with 2 text files. If it stays the same in one test you can hardcode it into the test case. I polished the result a little, if you want to take a look: http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such nice graphs. (green is the max results delivered, upon 66 active users per second the response time increases (orange/yellow, average and median of the response times) (i know the scales and descriptions are missing :-) but you should get the picture) I manually reduced the machines capacity, elsewise solr would server more than 12000 requests per second. (the whole index did fit into ram) I can send you my saved test case if this would help you. Nico Jacob Singh wrote: Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: Benchmarking tools?
Me too. Thanks. Jacob Singh wrote: nice stuff. Please send me the test case, I'd love to see it. Thanks, Jacob Nico Heid wrote: Hi, I basically followed this: http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae I basically put all my queries in a flat text file. you could either use two parameters or put them in one file. The good point of this is, that each test uses the same queries, so you can compare the settings better afterwards. If you use varying facets, you might just go with 2 text files. If it stays the same in one test you can hardcode it into the test case. I polished the result a little, if you want to take a look: http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such nice graphs. (green is the max results delivered, upon 66 active users per second the response time increases (orange/yellow, average and median of the response times) (i know the scales and descriptions are missing :-) but you should get the picture) I manually reduced the machines capacity, elsewise solr would server more than 12000 requests per second. (the whole index did fit into ram) I can send you my saved test case if this would help you. Nico Jacob Singh wrote: Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: Fw: Download solr-tools rpm
Thanks Hoss, I found them in SRC/ SCRIPTS. But i dont know how to execute those snapshooter, snappuller, abc, backup... How I can make one instance of solr as master and other as slave. Is it fully depends of rsyncd -Suresh - Original Message - From: Chris Hostetter [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, March 29, 2007 4:04 AM Subject: Re: Fw: Download solr-tools rpm : I need to configure master / slave servers. Hence i check at wiki help : documents. I found that i need to install solr-tools rpm. But i could : not able to download the files. Please some help me with solr-tools rpm. Any refrences to a solr-tools rpm on the wiki are outdated and leftover from when i ported those wiki pages from CNET ... Apache Solr doesn't distribute anything as an RPM, you should be abl to find all of those scripts in the Solr release tgz bundles. -Hoss
Re: Fw: Download solr-tools rpm
: I found them in SRC/ SCRIPTS. But i dont know how to execute those : snapshooter, snappuller, abc, backup... How I can make one instance of solr : as master and other as slave. Is it fully depends of rsyncd rsync is in fact at the heart of the replication ... that's really all those scripts are is some hardlinking followed by some rsyncing. How to use them (suggested crontab configuration, etc...) is documented fairly completely on the wiki... http://wiki.apache.org/solr/CollectionDistribution : : -Suresh : : - Original Message - : From: Chris Hostetter [EMAIL PROTECTED] : To: solr-user@lucene.apache.org : Sent: Thursday, March 29, 2007 4:04 AM : Subject: Re: Fw: Download solr-tools rpm : : : : : I need to configure master / slave servers. Hence i check at wiki help : : documents. I found that i need to install solr-tools rpm. But i could : : not able to download the files. Please some help me with solr-tools rpm. : : Any refrences to a solr-tools rpm on the wiki are outdated and leftover : from when i ported those wiki pages from CNET ... Apache Solr doesn't : distribute anything as an RPM, you should be abl to find all of those : scripts in the Solr release tgz bundles. : : -Hoss : : : -Hoss
Re: Fw: Download solr-tools rpm
: I need to configure master / slave servers. Hence i check at wiki help : documents. I found that i need to install solr-tools rpm. But i could : not able to download the files. Please some help me with solr-tools rpm. Any refrences to a solr-tools rpm on the wiki are outdated and leftover from when i ported those wiki pages from CNET ... Apache Solr doesn't distribute anything as an RPM, you should be abl to find all of those scripts in the Solr release tgz bundles. -Hoss