Re: Java Advanced Imaging (JAI) Image I/O Tools are not installed

2018-11-06 Thread Yasufumi Mizoguchi
Hi,

It seems a PDFBox issue, I think.
( https://pdfbox.apache.org/2.0/dependencies.html )

Thanks,
Yasufumi


2018年11月6日(火) 16:10 Furkan KAMACI :

> Hi All,
>
> I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so
> slow. However, I see that error when I check logs:
>
> o.a.p.c.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging
> (JAI) Image I/O Tools are not installed
>
> Any idea how to fix this?
>
> Kind  Regards,
> Furkan KAMACI
>


Java Advanced Imaging (JAI) Image I/O Tools are not installed

2018-11-05 Thread Furkan KAMACI
Hi All,

I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so
slow. However, I see that error when I check logs:

o.a.p.c.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging
(JAI) Image I/O Tools are not installed

Any idea how to fix this?

Kind  Regards,
Furkan KAMACI


Re: Setting of TMP in solr.cmd (for Windows) causes invisibility of the Solr to JDK monitoring tools

2018-09-16 Thread p.bodnar
Hi Erick,

thanks for your feedback, so I've created a corresponding issue in here: 
https://issues.apache.org/jira/browse/SOLR-12776

Hopefully that will suffice :)

Regards

Petr

__
> Od: "Erick Erickson" 
> Komu: "solr-user" 
> Datum: 03.09.2018 01:38
> Předmět: Re: Setting of TMP in solr.cmd (for Windows) causes invisibility of 
> the Solr to JDK monitoring tools
>
>Hmmm, please raise a JIRA and, if possible, attach a patch that works
>for you. Most of us don't have Windows machines readily available
>which hobbles testing, so it's very helpful of someone can test in a
>real environment.
>
>Best,
>Erick
>On Sun, Sep 2, 2018 at 1:47 PM  wrote:
>>
>> Hi,
>>
>> please notice the following lines added (among others) to "solr.cmd" by 
>> commit 
>> https://github.com/apache/lucene-solr/commit/b36c68b16e67ae701cefce052a4fdbaac88fb65c
>>  for https://issues.apache.org/jira/browse/SOLR-6833 about 4 years ago:
>>
>>   set TMP=!SOLR_HOME:%EXAMPLE_DIR%=!
>>   IF NOT "%TMP%"=="%SOLR_HOME%" (
>> set "SOLR_LOGS_DIR=%SOLR_HOME%\..\logs"
>> set "LOG4J_CONFIG=file:%EXAMPLE_DIR%\resources\log4j.properties"
>>   )
>>
>> Apparently, the new variable "TMP" is just a temporary one, but by 
>> coincidence, this variable is also important for JVM! As this system 
>> variable tells where the "hsperfdata_" directory for storing 
>> applications' monitoring data should be located. And if this is changed, JDK 
>> tools like JVisualVM and others won't locally see the given Java 
>> application, because they search in a different default location. Tested 
>> with Java 8u152 and Solr 6.3.0.
>>
>> So Solr authors, could you please rename that "TMP" variable to something 
>> else, or maybe remove it completely (not sure about the latter alternative)? 
>> Hopefully it is as easy as described above and I haven't overlooked some 
>> special meaning of that problematic lines...
>>
>> Best regards
>>
>> Petr B.
>
>


Re: Setting of TMP in solr.cmd (for Windows) causes invisibility of the Solr to JDK monitoring tools

2018-09-02 Thread Erick Erickson
Hmmm, please raise a JIRA and, if possible, attach a patch that works
for you. Most of us don't have Windows machines readily available
which hobbles testing, so it's very helpful of someone can test in a
real environment.

Best,
Erick
On Sun, Sep 2, 2018 at 1:47 PM  wrote:
>
> Hi,
>
> please notice the following lines added (among others) to "solr.cmd" by 
> commit 
> https://github.com/apache/lucene-solr/commit/b36c68b16e67ae701cefce052a4fdbaac88fb65c
>  for https://issues.apache.org/jira/browse/SOLR-6833 about 4 years ago:
>
>   set TMP=!SOLR_HOME:%EXAMPLE_DIR%=!
>   IF NOT "%TMP%"=="%SOLR_HOME%" (
> set "SOLR_LOGS_DIR=%SOLR_HOME%\..\logs"
> set "LOG4J_CONFIG=file:%EXAMPLE_DIR%\resources\log4j.properties"
>   )
>
> Apparently, the new variable "TMP" is just a temporary one, but by 
> coincidence, this variable is also important for JVM! As this system variable 
> tells where the "hsperfdata_" directory for storing applications' 
> monitoring data should be located. And if this is changed, JDK tools like 
> JVisualVM and others won't locally see the given Java application, because 
> they search in a different default location. Tested with Java 8u152 and Solr 
> 6.3.0.
>
> So Solr authors, could you please rename that "TMP" variable to something 
> else, or maybe remove it completely (not sure about the latter alternative)? 
> Hopefully it is as easy as described above and I haven't overlooked some 
> special meaning of that problematic lines...
>
> Best regards
>
> Petr B.


Setting of TMP in solr.cmd (for Windows) causes invisibility of the Solr to JDK monitoring tools

2018-09-02 Thread p.bodnar
Hi,

please notice the following lines added (among others) to "solr.cmd" by commit 
https://github.com/apache/lucene-solr/commit/b36c68b16e67ae701cefce052a4fdbaac88fb65c
 for https://issues.apache.org/jira/browse/SOLR-6833 about 4 years ago:

  set TMP=!SOLR_HOME:%EXAMPLE_DIR%=!
  IF NOT "%TMP%"=="%SOLR_HOME%" (
set "SOLR_LOGS_DIR=%SOLR_HOME%\..\logs"
set "LOG4J_CONFIG=file:%EXAMPLE_DIR%\resources\log4j.properties"
  )

Apparently, the new variable "TMP" is just a temporary one, but by coincidence, 
this variable is also important for JVM! As this system variable tells where 
the "hsperfdata_" directory for storing applications' monitoring data 
should be located. And if this is changed, JDK tools like JVisualVM and others 
won't locally see the given Java application, because they search in a 
different default location. Tested with Java 8u152 and Solr 6.3.0.

So Solr authors, could you please rename that "TMP" variable to something else, 
or maybe remove it completely (not sure about the latter alternative)? 
Hopefully it is as easy as described above and I haven't overlooked some 
special meaning of that problematic lines...

Best regards

Petr B.


Re: integrate solr with preprocessor tools

2015-12-16 Thread Emir Arnautovic

Hi Sara,
I would recommend looking at code of some component that you use 
currently and start from that - you can extend that class or use it as 
template for your own.


Thanks,
Emir

On 16.12.2015 09:58, sara hajili wrote:

hi Emir,tnx for answering
now my question is how i write this class?
i must use solr interfaces?
i see in above link that i can use solr analyzer.but how i use that?
plz say me how i start to write my own analyzer step by step...
which interface i can use and change to achieve my goal?
tnx

On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Sara,
You need to wrap your code in tokenizer or token filter
https://wiki.apache.org/solr/SolrPlugins

If you want to improve existing and believe others can benefit from
improvement, you can open ticket and submit patch.

Thanks,
Emir


On 09.12.2015 10:41, sara hajili wrote:


hi i wanna to use solr , and language of my documents that i stored in
solr
is persian.
solr doesn't support persian as well as i want.so i find preprocessor
tools
like a normalization,tockenizer and etc ...
i don't want to use solr persian filter like persian tockenizer,i mean i
wanna to improve it.

now my question is how i can integrate solr with this external
preprocessor
tools??

tnx



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: integrate solr with preprocessor tools

2015-12-16 Thread sara hajili
hi Emir,tnx for answering
now my question is how i write this class?
i must use solr interfaces?
i see in above link that i can use solr analyzer.but how i use that?
plz say me how i start to write my own analyzer step by step...
which interface i can use and change to achieve my goal?
tnx

On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> You need to wrap your code in tokenizer or token filter
> https://wiki.apache.org/solr/SolrPlugins
>
> If you want to improve existing and believe others can benefit from
> improvement, you can open ticket and submit patch.
>
> Thanks,
> Emir
>
>
> On 09.12.2015 10:41, sara hajili wrote:
>
>> hi i wanna to use solr , and language of my documents that i stored in
>> solr
>> is persian.
>> solr doesn't support persian as well as i want.so i find preprocessor
>> tools
>> like a normalization,tockenizer and etc ...
>> i don't want to use solr persian filter like persian tockenizer,i mean i
>> wanna to improve it.
>>
>> now my question is how i can integrate solr with this external
>> preprocessor
>> tools??
>>
>> tnx
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: integrate solr with preprocessor tools

2015-12-09 Thread Emir Arnautovic

Hi Sara,
You need to wrap your code in tokenizer or token filter 
https://wiki.apache.org/solr/SolrPlugins


If you want to improve existing and believe others can benefit from 
improvement, you can open ticket and submit patch.


Thanks,
Emir

On 09.12.2015 10:41, sara hajili wrote:

hi i wanna to use solr , and language of my documents that i stored in solr
is persian.
solr doesn't support persian as well as i want.so i find preprocessor tools
like a normalization,tockenizer and etc ...
i don't want to use solr persian filter like persian tockenizer,i mean i
wanna to improve it.

now my question is how i can integrate solr with this external preprocessor
tools??

tnx



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



integrate solr with preprocessor tools

2015-12-09 Thread sara hajili
hi i wanna to use solr , and language of my documents that i stored in solr
is persian.
solr doesn't support persian as well as i want.so i find preprocessor tools
like a normalization,tockenizer and etc ...
i don't want to use solr persian filter like persian tockenizer,i mean i
wanna to improve it.

now my question is how i can integrate solr with this external preprocessor
tools??

tnx


Solr Cloud Management Tools

2014-11-04 Thread elangovan palani




Hello. 




Can someone suggest SolrCloud Management tool 

I'm Looking to gather Collection/Docuements/Shares Metrics and also 

to collect Data about the cluster usage on Mem,ReadWrites etc.. 




Thanks




Elan


Re: Solr Cloud Management Tools

2014-11-04 Thread Michael Della Bitta
http://sematext.com/spm/

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Tue, Nov 4, 2014 at 3:01 PM, elangovan palani elang...@yahoo.com.invalid
 wrote:





 Hello.




 Can someone suggest SolrCloud Management tool

 I'm Looking to gather Collection/Docuements/Shares Metrics and also

 to collect Data about the cluster usage on Mem,ReadWrites etc..




 Thanks




 Elan



Re: Solr Cloud Management Tools

2014-11-04 Thread Alexandre Rafalovitch
SemaText products are usually a good place to start fine tuning your
requirements: http://sematext.com/index.html I believe they do trials
as well.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 4 November 2014 15:01, elangovan palani elang...@yahoo.com.invalid wrote:




 Hello.




 Can someone suggest SolrCloud Management tool

 I'm Looking to gather Collection/Docuements/Shares Metrics and also

 to collect Data about the cluster usage on Mem,ReadWrites etc..




 Thanks




 Elan


Optimal setup for multiple tools

2014-04-26 Thread Jimmy Lin
Hello,

My team has been working with SOLR for the last 2 years.  We have two main
indices:

1. documents
-index and store main text
-one record for each document
2. places (all of the geospatial places found in the documents above)
-index but don't store main text
-one record for each place.  could have thousands in a single
document but the ratio has seemed to come out to 6:1 places to documents

We have several tools that query the above indices.  One is just a standard
search tool that returns documents filtered on keyword, temporal, and
geospatial filters.  Another is a geospatial tool that queries the places
collection.  We now have a requirement to provide document highlighting
when querying in the geospatial tool.

Does anyone have any suggestions/prior experience on how they would set up
two collections that are essentially different views of the data?  Also
any tips on how to ensure that these two collections are in sync (meaning
any documents indexed into the documents collection are also properly
indexed in places)?

Thanks alot,

Jimmy Lin


Re: Optimal setup for multiple tools

2014-04-26 Thread Erick Erickson
Have you considered putting them in the _same_ index? There's not much
penalty at all with having sparsely populated fields in a document, so
the fact that the two parts of your index had orthogonal fields
wouldn't cost you much and would solve the synchronization problem.

You can include a type field to distinguish between the and just
include a filter query to keep them separate. Since that'll be cached,
your search performance should be fine.

Otherwise you should include the fields you need to sort on in the
index you need to sort. Denormalizes the data, but...

About keeping the two in synch, that's really outside Solr, your
indexing process has to manage that I'd guess.

Best,
Erick

On Sat, Apr 26, 2014 at 7:24 AM, Jimmy Lin jimmys.em...@gmail.com wrote:
 Hello,

 My team has been working with SOLR for the last 2 years.  We have two main
 indices:

 1. documents
 -index and store main text
 -one record for each document
 2. places (all of the geospatial places found in the documents above)
 -index but don't store main text
 -one record for each place.  could have thousands in a single
 document but the ratio has seemed to come out to 6:1 places to documents

 We have several tools that query the above indices.  One is just a standard
 search tool that returns documents filtered on keyword, temporal, and
 geospatial filters.  Another is a geospatial tool that queries the places
 collection.  We now have a requirement to provide document highlighting
 when querying in the geospatial tool.

 Does anyone have any suggestions/prior experience on how they would set up
 two collections that are essentially different views of the data?  Also
 any tips on how to ensure that these two collections are in sync (meaning
 any documents indexed into the documents collection are also properly
 indexed in places)?

 Thanks alot,

 Jimmy Lin


Re: Tools for schema.xml generation and to import from a database

2012-07-30 Thread Andre Lopes
Thanks for the reply Alexandre,

I will test your clues as soon as possible.

Best Regards,



On Mon, Jul 30, 2012 at 4:15 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
 If you are just starting with SOLR, you might as well jump to 4.0
 Alpha. By the time you finished, it will be the production copy.

 If you want to index stuff from the database, your first step is
 probably to use DataImportHandler (DIH). Once you get past the basics,
 you may want to do a custom code, but start from from DIH for faster
 results.

 You will want to modify schema.xml. I started by using DIH example and
 just adding an extra core at first. This might be easier than building
 a full directory setup from scratch.

 You also don't actually need to configure schema too much at the
 beginning. You can start by using dynamic fields. So, if in DIH, you
 say that your target field is XYZ_i it is automatically picked by as
 an integer field by SOLR (due to *_i definition that you do need to
 have). This will not work for fields you want to do aggregation on
 (e.g. multiple text fields copied into one for easier search), for
 multilingual text fields, etc. But it will get you going.

 Oh, and welcome to SOLR. You will like it.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Sun, Jul 29, 2012 at 3:45 PM, Andre Lopes lopes80an...@gmail.com wrote:
 Hi,

 I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused
 about what and how to do next. I will use the Jetty version for now.

 Two poinst I need to know:

 1 - I've 2 views that I would like to import to Solr. I think I must
 do a schema.xml and then import data to that schema. I'm correct with
 this one?

 2 - About tools to autogenerate the schema.xml, there are any? And
 about tools to import data to the schema, there are any(I'm using
 Python)?


 Please give me some clues.

 Thanks,

 Best Regards,
 André.


Tools for schema.xml generation and to import from a database

2012-07-29 Thread Andre Lopes
Hi,

I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused
about what and how to do next. I will use the Jetty version for now.

Two poinst I need to know:

1 - I've 2 views that I would like to import to Solr. I think I must
do a schema.xml and then import data to that schema. I'm correct with
this one?

2 - About tools to autogenerate the schema.xml, there are any? And
about tools to import data to the schema, there are any(I'm using
Python)?


Please give me some clues.

Thanks,

Best Regards,
André.


Re: Tools for schema.xml generation and to import from a database

2012-07-29 Thread Alexandre Rafalovitch
If you are just starting with SOLR, you might as well jump to 4.0
Alpha. By the time you finished, it will be the production copy.

If you want to index stuff from the database, your first step is
probably to use DataImportHandler (DIH). Once you get past the basics,
you may want to do a custom code, but start from from DIH for faster
results.

You will want to modify schema.xml. I started by using DIH example and
just adding an extra core at first. This might be easier than building
a full directory setup from scratch.

You also don't actually need to configure schema too much at the
beginning. You can start by using dynamic fields. So, if in DIH, you
say that your target field is XYZ_i it is automatically picked by as
an integer field by SOLR (due to *_i definition that you do need to
have). This will not work for fields you want to do aggregation on
(e.g. multiple text fields copied into one for easier search), for
multilingual text fields, etc. But it will get you going.

Oh, and welcome to SOLR. You will like it.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sun, Jul 29, 2012 at 3:45 PM, Andre Lopes lopes80an...@gmail.com wrote:
 Hi,

 I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused
 about what and how to do next. I will use the Jetty version for now.

 Two poinst I need to know:

 1 - I've 2 views that I would like to import to Solr. I think I must
 do a schema.xml and then import data to that schema. I'm correct with
 this one?

 2 - About tools to autogenerate the schema.xml, there are any? And
 about tools to import data to the schema, there are any(I'm using
 Python)?


 Please give me some clues.

 Thanks,

 Best Regards,
 André.


Re: Lexical analysis tools for German language data

2012-04-13 Thread Tomas Zerolo
On Thu, Apr 12, 2012 at 03:46:56PM +, Michael Ludwig wrote:
  Von: Walter Underwood
 
  German noun decompounding is a little more complicated than it might
  seem.
  
  There can be transformations or inflections, like the s in
  Weinachtsbaum (Weinachten/Baum).
 
 I remember from my linguistics studies that the terminus technicus for
 these is Fugenmorphem (interstitial or joint morpheme) [...]

IANAL (I am not a linguist -- pun intended ;) but I've always read that
as a genitive. Any pointers?

Regards
-- 
Tomás Zerolo
Axel Springer AG
Axel Springer media Systems
BILD Produktionssysteme
Axel-Springer-Straße 65
10888 Berlin
Tel.: +49 (30) 2591-72875
tomas.zer...@axelspringer.de
www.axelspringer.de

Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998
Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita
Vorstand: Dr. Mathias Döpfner (Vorsitzender)
Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele


AW: Lexical analysis tools for German language data

2012-04-13 Thread Michael Ludwig
 Von: Tomas Zerolo

   There can be transformations or inflections, like the s in
   Weinachtsbaum (Weinachten/Baum).
 
  I remember from my linguistics studies that the terminus technicus
  for these is Fugenmorphem (interstitial or joint morpheme) [...]
 
 IANAL (I am not a linguist -- pun intended ;) but I've always read
 that as a genitive. Any pointers?

Admittedly, that's what you'd think, and despite linguistics telling me
otherwise I'd maintain there's some truth in it. For this case, however,
consider: die Weihnacht declines like die Nacht, so:

nom. die Weihnacht
gen. der Weihnacht
dat. der Weihnacht
akk. die Weihnacht

As you can see, there's no s to be found anywhere, not even in the
genitive. But my gut feeling, like yours, is that this should indicate
genitive, and I would make a point of well-argued gut feeling being at
least as relevant as formalist analysis.

Michael


Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of Windjacke (probably wind jacket in English), I'd
like the code that prepares the data for the index (tokenizer etc) to
understand that this is a Jacke (jacket) so that a query for Jacke
would include the Windjacke document in its result set.

It appears to me that such an analysis requires a dictionary-backed
approach, which doesn't have to be perfect at all; a list of the most
common 2000 words would probably do the job and fulfil a criterion of
reasonable usefulness.

Do you know of any implementation techniques or working implementations
to do this kind of lexical analysis for German language data? (Or other
languages, for that matter?) What are they, where can I find them?

I'm sure there is something out (commercial or free) because I've seen
lots of engines grokking German and the way it builds words.

Failing that, what are the proper terms do refer to these techniques so
you can search more successfully?

Michael


AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
 Given an input of Windjacke (probably wind jacket in English),
 I'd like the code that prepares the data for the index (tokenizer
 etc) to understand that this is a Jacke (jacket) so that a
 query for Jacke would include the Windjacke document in its
 result set.
 
 It appears to me that such an analysis requires a dictionary-
 backed approach, which doesn't have to be perfect at all; a list
 of the most common 2000 words would probably do the job and fulfil
 a criterion of reasonable usefulness.

A simple approach would obviously be a word list and a regular
expression. There will, however, be nuts and bolts to take care of.
A more sophisticated and tested approach might be known to you.

Michael


Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht

Michael,

I'm on this list and the lucene list since several years and have not found 
this yet.
It's been one neglected topics to my taste.

There is a CompoundAnalyzer but it requires the compounds to be dictionary 
based, as you indicate.

I am convinced there's a way to build the de-compounding words efficiently from 
a broad corpus but I have never seen it (and the experts at DFKI I asked for 
for also told me they didn't know of one).

paul

Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit :

 Given an input of Windjacke (probably wind jacket in English), I'd
 like the code that prepares the data for the index (tokenizer etc) to
 understand that this is a Jacke (jacket) so that a query for Jacke
 would include the Windjacke document in its result set.
 
 It appears to me that such an analysis requires a dictionary-backed
 approach, which doesn't have to be perfect at all; a list of the most
 common 2000 words would probably do the job and fulfil a criterion of
 reasonable usefulness.
 
 Do you know of any implementation techniques or working implementations
 to do this kind of lexical analysis for German language data? (Or other
 languages, for that matter?) What are they, where can I find them?
 
 I'm sure there is something out (commercial or free) because I've seen
 lots of engines grokking German and the way it builds words.
 
 Failing that, what are the proper terms do refer to these techniques so
 you can search more successfully?
 
 Michael



Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling

You might have a look at:
http://www.basistech.com/lucene/


Am 12.04.2012 11:52, schrieb Michael Ludwig:
 Given an input of Windjacke (probably wind jacket in English), I'd
 like the code that prepares the data for the index (tokenizer etc) to
 understand that this is a Jacke (jacket) so that a query for Jacke
 would include the Windjacke document in its result set.
 
 It appears to me that such an analysis requires a dictionary-backed
 approach, which doesn't have to be perfect at all; a list of the most
 common 2000 words would probably do the job and fulfil a criterion of
 reasonable usefulness.
 
 Do you know of any implementation techniques or working implementations
 to do this kind of lexical analysis for German language data? (Or other
 languages, for that matter?) What are they, where can I find them?
 
 I'm sure there is something out (commercial or free) because I've seen
 lots of engines grokking German and the way it builds words.
 
 Failing that, what are the proper terms do refer to these techniques so
 you can search more successfully?
 
 Michael


Re: Lexical analysis tools for German language data

2012-04-12 Thread Valeriy Felberg
If you want that query jacke matches a document containing the word
windjacke or kinderjacke, you could use a custom update processor.
This processor could search the indexed text for words matching the
pattern .*jacke and inject the word jacke into an additional field
which you can search against. You would need a whole list of possible
suffixes, of course. It would slow down the update process but you
don't need to split words during search.

Best,
Valeriy

On Thu, Apr 12, 2012 at 12:39 PM, Paul Libbrecht p...@hoplahup.net wrote:

 Michael,

 I'm on this list and the lucene list since several years and have not found 
 this yet.
 It's been one neglected topics to my taste.

 There is a CompoundAnalyzer but it requires the compounds to be dictionary 
 based, as you indicate.

 I am convinced there's a way to build the de-compounding words efficiently 
 from a broad corpus but I have never seen it (and the experts at DFKI I asked 
 for for also told me they didn't know of one).

 paul

 Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit :

 Given an input of Windjacke (probably wind jacket in English), I'd
 like the code that prepares the data for the index (tokenizer etc) to
 understand that this is a Jacke (jacket) so that a query for Jacke
 would include the Windjacke document in its result set.

 It appears to me that such an analysis requires a dictionary-backed
 approach, which doesn't have to be perfect at all; a list of the most
 common 2000 words would probably do the job and fulfil a criterion of
 reasonable usefulness.

 Do you know of any implementation techniques or working implementations
 to do this kind of lexical analysis for German language data? (Or other
 languages, for that matter?) What are they, where can I find them?

 I'm sure there is something out (commercial or free) because I've seen
 lots of engines grokking German and the way it builds words.

 Failing that, what are the proper terms do refer to these techniques so
 you can search more successfully?

 Michael



Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Bernd,

can you please say a little more?
I think this list is ok to contain some description for commercial solutions 
that satisfy a request formulated on list.

Is there any product at BASIS Tech that provides a compound-analyzer with a big 
dictionary of decomposed compounds in German? If yes, for which domain? The 
Google Search result (I wonder if this is politically correct to not have yours 
;-)) shows me that there's an amount of job done in this direction (e.g. Gärten 
to match Garten) but being precise for this question would be more helpful!

paul


Le 12 avr. 2012 à 12:46, Bernd Fehling a écrit :

 
 You might have a look at:
 http://www.basistech.com/lucene/
 
 
 Am 12.04.2012 11:52, schrieb Michael Ludwig:
 Given an input of Windjacke (probably wind jacket in English), I'd
 like the code that prepares the data for the index (tokenizer etc) to
 understand that this is a Jacke (jacket) so that a query for Jacke
 would include the Windjacke document in its result set.
 
 It appears to me that such an analysis requires a dictionary-backed
 approach, which doesn't have to be perfect at all; a list of the most
 common 2000 words would probably do the job and fulfil a criterion of
 reasonable usefulness.
 
 Do you know of any implementation techniques or working implementations
 to do this kind of lexical analysis for German language data? (Or other
 languages, for that matter?) What are they, where can I find them?
 
 I'm sure there is something out (commercial or free) because I've seen
 lots of engines grokking German and the way it builds words.
 
 Failing that, what are the proper terms do refer to these techniques so
 you can search more successfully?
 
 Michael



Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
Paul,

nearly two years ago I requested an evaluation license and tested BASIS Tech
Rosette for Lucene  Solr. Was working excellent but the price much much to 
high.

Yes, they also have compound analysis for several languages including German.
Just configure your pipeline in solr and setup the processing pipeline in
Rosette Language Processing (RLP) and thats it.

Example from my very old schema.xml config:

fieldtype name=text_rlp class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=com.basistech.rlp.solr.RLPTokenizerFactory
rlpContext=solr/conf/rlp-index-context.xml
postPartOfSpeech=false
postLemma=true
postStem=true
postCompoundComponents=true/
 filter class=solr.LowerCaseFilterFactory/
 filter 
class=org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=com.basistech.rlp.solr.RLPTokenizerFactory
rlpContext=solr/conf/rlp-query-context.xml
postPartOfSpeech=false
postLemma=true
postCompoundComponents=true/
 filter class=solr.LowerCaseFilterFactory/
 filter 
class=org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype

So you just point tokenizer to RLP and have two RLP pipelines configured,
one for indexing (rlp-index-context.xml) and one for querying 
(rlp-query-context.xml).

Example form my rlp-index-context.xml config:

contextconfig
  properties
property name=com.basistech.rex.optimize value=false/
property name=com.basistech.ela.retokenize_for_rex value=true/
  /properties
  languageprocessors
languageprocessorUnicode Converter/languageprocessor
languageprocessorLanguage Identifier/languageprocessor
languageprocessorEncoding and Character Normalizer/languageprocessor
languageprocessorEuropean Language Analyzer/languageprocessor
!--languageprocessorScript Region Locator/languageprocessor
languageprocessorJapanese Language Analyzer/languageprocessor
languageprocessorChinese Language Analyzer/languageprocessor
languageprocessorKorean Language Analyzer/languageprocessor
languageprocessorSentence Breaker/languageprocessor
languageprocessorWord Breaker/languageprocessor
languageprocessorArabic Language Analyzer/languageprocessor
languageprocessorPersian Language Analyzer/languageprocessor
languageprocessorUrdu Language Analyzer/languageprocessor --
languageprocessorStopword Locator/languageprocessor
languageprocessorBase Noun Phrase Locator/languageprocessor
!--languageprocessorStatistical Entity Extractor/languageprocessor --
languageprocessorExact Match Entity Extractor/languageprocessor
languageprocessorPattern Match Entity Extractor/languageprocessor
languageprocessorEntity Redactor/languageprocessor
languageprocessorREXML Writer/languageprocessor
  /languageprocessors
/contextconfig

As you can see I used the European Language Analyzer.

Bernd



Am 12.04.2012 12:58, schrieb Paul Libbrecht:
 Bernd,
 
 can you please say a little more?
 I think this list is ok to contain some description for commercial solutions 
 that satisfy a request formulated on list.
 
 Is there any product at BASIS Tech that provides a compound-analyzer with a 
 big dictionary of decomposed compounds in German? 
 If yes, for which domain? 
 The Google Search result (I wonder if this is politically correct to not have 
 yours ;-)) shows me that there's an amount 
 of job done in this direction (e.g. Gärten to match Garten) but being precise 
 for this question would be more helpful!
 
 paul
 
 


AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
 Von: Valeriy Felberg

 If you want that query jacke matches a document containing the word
 windjacke or kinderjacke, you could use a custom update processor.
 This processor could search the indexed text for words matching the
 pattern .*jacke and inject the word jacke into an additional field
 which you can search against. You would need a whole list of possible
 suffixes, of course.

Merci, Valeriy - I agree on the feasability of such an approach. The
list would likely have to be composed of the most frequently used terms
for your specific domain.

In our case, it's things people would buy in shops. Reducing overly
complicated and convoluted product descriptions to proper basic terms -
that would do the job. It's like going to a restaurant boasting fancy
and unintelligible names for the dishes you may order when they are
really just ordinary stuff like pork and potatoes.

Thinking some more about it, giving sufficient boost to the attached
category data might also do the job. That would shift the burden of
supplying proper semantics to the guys doing the categorization.

 It would slow down the update process but you don't need to split
 words during search.

  Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit :
 
  Given an input of Windjacke (probably wind jacket in English),
  I'd like the code that prepares the data for the index (tokenizer
  etc) to understand that this is a Jacke (jacket) so that a
  query for Jacke would include the Windjacke document in its
  result set.

A query for Windjacke or Kinderjacke would probably not have to be
de-specialized to Jacke because, well, that's the user input and users
looking for specific things are probably doing so for a reason. If no
matches are found you can still tell them to just broaden their search.

Michael


Re: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
Hi,

We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a 
from TeX generated FOP XML file for the Dutch language and have seen decent 
results. A bonus was that now some tokens can be stemmed properly because not 
all compounds are listed in the dictionary for the HunspellStemFilter.

It does introduce a recall/precision problem but it at least returns results 
for those many users that do not properly use compounds in their search query.

There seem to be a small issue with the filter where minSubwordSize=N yields 
subwords of size N-1.

Cheers,

On Thursday 12 April 2012 12:39:44 Paul Libbrecht wrote:
 Michael,
 
 I'm on this list and the lucene list since several years and have not found
 this yet. It's been one neglected topics to my taste.
 
 There is a CompoundAnalyzer but it requires the compounds to be dictionary
 based, as you indicate.
 
 I am convinced there's a way to build the de-compounding words efficiently
 from a broad corpus but I have never seen it (and the experts at DFKI I
 asked for for also told me they didn't know of one).
 
 paul
 
 Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit :
  Given an input of Windjacke (probably wind jacket in English), I'd
  like the code that prepares the data for the index (tokenizer etc) to
  understand that this is a Jacke (jacket) so that a query for Jacke
  would include the Windjacke document in its result set.
  
  It appears to me that such an analysis requires a dictionary-backed
  approach, which doesn't have to be perfect at all; a list of the most
  common 2000 words would probably do the job and fulfil a criterion of
  reasonable usefulness.
  
  Do you know of any implementation techniques or working implementations
  to do this kind of lexical analysis for German language data? (Or other
  languages, for that matter?) What are they, where can I find them?
  
  I'm sure there is something out (commercial or free) because I've seen
  lots of engines grokking German and the way it builds words.
  
  Failing that, what are the proper terms do refer to these techniques so
  you can search more successfully?
  
  Michael

-- 
Markus Jelsma - CTO - Openindex


AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
 Von: Markus Jelsma

 We've done a lot of tests with the HyphenationCompoundWordTokenFilter
 using a from TeX generated FOP XML file for the Dutch language and
 have seen decent results. A bonus was that now some tokens can be
 stemmed properly because not all compounds are listed in the
 dictionary for the HunspellStemFilter.

Thank you for pointing me to these two filter classes.

 It does introduce a recall/precision problem but it at least returns
 results for those many users that do not properly use compounds in
 their search query.

Could you define what the term recall should be taken to mean in this
context? I've also encountered it on the BASIStech website. Okay, I
found a definition:

http://en.wikipedia.org/wiki/Precision_and_recall

Dank je wel!

Michael


Re: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
German noun decompounding is a little more complicated than it might seem.

There can be transformations or inflections, like the s in Weinachtsbaum 
(Weinachten/Baum).

Internal nouns should be recapitalized, like Baum above.

Some compounds probably should not be decompounded, like Fahrrad 
(farhren/Rad). With a dictionary-based stemmer, you might decide to avoid 
decompounding for words in the dictionary.

Verbs get more complicated inflections, and might need to be decapitalized, 
like farhren above.

Und so weiter.

Note that highlighting gets pretty weird when you are matching only part of a 
word.

Luckily, a lot of compounds are simple, and you could well get a measurable 
improvement with a very simple algorithm. There isn't anything complicated 
about compounds like Orgelmusik or Netzwerkbetreuer.

The Basis Technology linguistic analyzers aren't cheap or small, but they work 
well. 

wunder

On Apr 12, 2012, at 3:58 AM, Paul Libbrecht wrote:

 Bernd,
 
 can you please say a little more?
 I think this list is ok to contain some description for commercial solutions 
 that satisfy a request formulated on list.
 
 Is there any product at BASIS Tech that provides a compound-analyzer with a 
 big dictionary of decomposed compounds in German? If yes, for which domain? 
 The Google Search result (I wonder if this is politically correct to not have 
 yours ;-)) shows me that there's an amount of job done in this direction 
 (e.g. Gärten to match Garten) but being precise for this question would be 
 more helpful!
 
 paul
 
 
 Le 12 avr. 2012 à 12:46, Bernd Fehling a écrit :
 
 
 You might have a look at:
 http://www.basistech.com/lucene/
 
 
 Am 12.04.2012 11:52, schrieb Michael Ludwig:
 Given an input of Windjacke (probably wind jacket in English), I'd
 like the code that prepares the data for the index (tokenizer etc) to
 understand that this is a Jacke (jacket) so that a query for Jacke
 would include the Windjacke document in its result set.
 
 It appears to me that such an analysis requires a dictionary-backed
 approach, which doesn't have to be perfect at all; a list of the most
 common 2000 words would probably do the job and fulfil a criterion of
 reasonable usefulness.
 
 Do you know of any implementation techniques or working implementations
 to do this kind of lexical analysis for German language data? (Or other
 languages, for that matter?) What are they, where can I find them?
 
 I'm sure there is something out (commercial or free) because I've seen
 lots of engines grokking German and the way it builds words.
 
 Failing that, what are the proper terms do refer to these techniques so
 you can search more successfully?
 
 Michael






AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
 Von: Walter Underwood

 German noun decompounding is a little more complicated than it might
 seem.
 
 There can be transformations or inflections, like the s in
 Weinachtsbaum (Weinachten/Baum).

I remember from my linguistics studies that the terminus technicus for
these is Fugenmorphem (interstitial or joint morpheme). But there's
not many of them - phrased in a regex, it's /e?[ns]/. The Weinachtsbaum
in the example above is from the singular (die Weihnacht), then s,
then Baum. Still, it's much more complex then, say, English or Italian.

 Internal nouns should be recapitalized, like Baum above.

Casing won't matter for indexing, I think. The way I would go about
obtaining stems from compound words is by using a dictionary of stems
and a regex. We'll see how far that'll take us.

 Some compounds probably should not be decompounded, like Fahrrad
 (farhren/Rad). With a dictionary-based stemmer, you might decide to
 avoid decompounding for words in the dictionary.

Good point.

 Note that highlighting gets pretty weird when you are matching only
 part of a word.

Guess it'll be a weird when you get it wrong, like Noten in
Notentriegelung.

 Luckily, a lot of compounds are simple, and you could well get a
 measurable improvement with a very simple algorithm. There isn't
 anything complicated about compounds like Orgelmusik or
 Netzwerkbetreuer.

Exactly.

 The Basis Technology linguistic analyzers aren't cheap or small, but
 they work well.

We will consider our needs and options. Thanks for your thoughts.

Michael


Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht

Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit :
 Some compounds probably should not be decompounded, like Fahrrad
 (farhren/Rad). With a dictionary-based stemmer, you might decide to
 avoid decompounding for words in the dictionary.
 
 Good point.

More or less, Fahrrad is generally abbreviated as Rad.
(even though Rad can mean wheel and bike)

 Note that highlighting gets pretty weird when you are matching only
 part of a word.
 
 Guess it'll be a weird when you get it wrong, like Noten in
 Notentriegelung.

This decomposition should not happen because Noten-triegelung does not have a 
correct second term.

 The Basis Technology linguistic analyzers aren't cheap or small, but
 they work well.
 
 We will consider our needs and options. Thanks for your thoughts.

My question remains as to which domain it aims at covering.
We had such need for mathematics texts... I would be pleasantly surprised if, 
for example, Differenzen-quotient  would be decompounded.

paul

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 8:46 AM, Michael Ludwig wrote:

 I remember from my linguistics studies that the terminus technicus for
 these is Fugenmorphem (interstitial or joint morpheme). 

That is some excellent linguistic jargon. I'll file that with hapax legomenon.

If you don't highlight, you can get good results with pretty rough analyzers, 
but highlighting exposes those, even when they don't affect relevance. For 
example, you can get good relevance just indexing bigrams in Chinese, but it 
looks awful when you highlight them. As soon as you highlight, you need a 
dictionary-based segmenter.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
On Thursday 12 April 2012 18:00:14 Paul Libbrecht wrote:
 Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit :
  Some compounds probably should not be decompounded, like Fahrrad
  (farhren/Rad). With a dictionary-based stemmer, you might decide to
  avoid decompounding for words in the dictionary.
  
  Good point.
 
 More or less, Fahrrad is generally abbreviated as Rad.
 (even though Rad can mean wheel and bike)
 
  Note that highlighting gets pretty weird when you are matching only
  part of a word.
  
  Guess it'll be a weird when you get it wrong, like Noten in
  Notentriegelung.
 
 This decomposition should not happen because Noten-triegelung does not have
 a correct second term.
 
  The Basis Technology linguistic analyzers aren't cheap or small, but
  they work well.
  
  We will consider our needs and options. Thanks for your thoughts.
 
 My question remains as to which domain it aims at covering.
 We had such need for mathematics texts... I would be pleasantly surprised
 if, for example, Differenzen-quotient  would be decompounded.

The HyphenationCompoundWordTokenFilter can do those things but those words 
must be listed in the dictionary or you'll get strange results. It still 
yields strange results when it emits tokens that are subwords of a subword.

 
 paul

-- 
Markus Jelsma - CTO - Openindex


Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 9:00 AM, Paul Libbrecht wrote:

 More or less, Fahrrad is generally abbreviated as Rad.
 (even though Rad can mean wheel and bike)

A synonym could handle this, since farhren would not be a good match. It is 
judgement call, but this seems more like an equivalence Fahrrad = Rad than 
decompounding.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Reporting tools

2012-03-09 Thread Gora Mohanty
On 9 March 2012 09:05, Donald Organ dor...@donaldorgan.com wrote:
 Are there any reporting tools out there?  So I can analyzer search term
 frequency, filter frequency,  etc?

Do not have direct experience of any Solr reporting
tool, but please see the Solr StatsComponent:
http://wiki.apache.org/solr/StatsComponent

This should provide you with data on the Solr
index.

Regards,
Gora


Re: Reporting tools

2012-03-09 Thread Tommaso Teofili
as Gora says there is the stats component you can take advantage of or you
could also use JMX directly [1] or LucidGaze [2][3] or commercial services
like [4] or [5] (these are the ones I know but there may be also others),
each of them with different level/type of service.

Tommaso

[1] : http://wiki.apache.org/solr/SolrJmx
[2] : http://www.lucidimagination.com/blog/2009/08/24/lucid-gaze-for-lucene/
[3] : http://www.chrisumbel.com/article/monitoring_solr_lucidgaze
[4] : http://sematext.com/search-analytics/index.html
[5] : http://newrelic.com/


2012/3/9 Donald Organ dor...@donaldorgan.com

 Are there any reporting tools out there?  So I can analyzer search term
 frequency, filter frequency,  etc?



Re: Reporting tools

2012-03-09 Thread Ahmet Arslan
 Are there any reporting tools out
 there?  So I can analyzer search term
 frequency, filter frequency,  etc?

You might be interested in this :
http://www.sematext.com/search-analytics/index.html


Re: Reporting tools

2012-03-09 Thread Koji Sekiguchi

(12/03/09 12:35), Donald Organ wrote:

Are there any reporting tools out there?  So I can analyzer search term
frequency, filter frequency,  etc?


You may be interested in:

Free Query Log Visualizer for Apache Solr
http://soleami.com/

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


Reporting tools

2012-03-08 Thread Donald Organ
Are there any reporting tools out there?  So I can analyzer search term
frequency, filter frequency,  etc?


Re: cache monitoring tools?

2011-12-19 Thread Dmitry Kan
Thanks Justin. The reason I decided to ask that is how easy is it to
bootstrap a system like Munin. This of course depends on how fast one needs
it. That is, if SOLR already exposes certain stat via jxm accessible beans,
that will make it easier and faster to set up a tool that can read from
jmx. Only my opinion.

Thanks,
Dmitry

On Fri, Dec 16, 2011 at 4:55 AM, Justin Caratzas
justin.carat...@gmail.comwrote:

 Dmitry,

 Thats beyond the scope of this thread, but Munin essentially runs
 plugins which are essentially scripts that output graph configuration
 and values when polled by the Munin server.  So it uses a plain text
 protocol, so that the scripts can be written in any language.  Munin
 then feeds this info into RRDtool, which displays the graph.  There are
 some examples[1] of solr plugins that people have used to scrape the
 stats.jsp page.

 Justin

 1. http://exchange.munin-monitoring.org/plugins/search?keyword=solr

 Dmitry Kan dmitry@gmail.com writes:

  Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how
  about munin, what protocol / way it uses to accumulate stats? It wasn't
  obvious from their online documentation...
 
  On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas
  justin.carat...@gmail.comwrote:
 
  Dmitry,
 
  The only added stress that munin puts on each box is the 1 request per
  stat per 5 minutes to our admin stats handler.  Given that we get 25
  requests per second, this doesn't make much of a difference.  We don'tg
  have a sharded index (yet) as our index is only 2-3 GB, but we do have
  slave servers with replicated
  indexes that handle the queries, while our master handles
  updates/commits.
 
  Justin
 
  Dmitry Kan dmitry@gmail.com writes:
 
   Justin, in terms of the overhead, have you noticed if Munin puts much
 of
  it
   when used in production? In terms of the solr farm: how big is a
 shard's
   index (given you have sharded architecture).
  
   Dmitry
  
   On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
   justin.carat...@gmail.comwrote:
  
   At my work, we use Munin and Nagio for monitoring and alerts.  Munin
 is
   great because writing a plugin for it so simple, and with Solr's
   statistics handler, we can track almost any solr stat we want.  It
 also
   comes with included plugins for load, file system stats, processes,
   etc.
  
   http://munin-monitoring.org/
  
   Justin
  
   Paul Libbrecht p...@hoplahup.net writes:
  
Allow me to chim in and ask a generic question about monitoring
 tools
for people close to developers: are any of the tools mentioned in
 this
thread actually able to show graphs of loads, e.g. cache counts or
 CPU
load, in parallel to a console log or to an http request log??
   
I am working on such a tool currently but I have a bad feeling of
   reinventing the wheel.
   
thanks in advance
   
Paul
   
   
   
Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
   
Otis, Tomás: thanks for the great links!
   
2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
   
Hi Dimitry, I pointed to the wiki page to enable JMX, then you
 can
  use
   any
tool that visualizes JMX stuff like Zabbix. See
   
   
  
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
   
On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan 
 dmitry@gmail.com
   wrote:
   
The culprit seems to be the merger (frontend) SOLR. Talking to
 one
   shard
directly takes substantially less time (1-2 sec).
   
On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan 
 dmitry@gmail.com
   wrote:
   
Tomás: thanks. The page you gave didn't mention cache
  specifically,
   is
there more documentation on this specifically? I have used
  solrmeter
tool,
it draws the cache diagrams, is there a similar tool, but which
  would
use
jmx directly and present the cache usage in runtime?
   
pravesh:
I have increased the size of filterCache, but the search hasn't
   become
any
faster, taking almost 9 sec on avg :(
   
name: search
class: org.apache.solr.handler.component.SearchHandler
version: $Revision: 1052938 $
description: Search using components:
   
   
   
  
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
   
stats: handlerStart : 1323255147351
requests : 100
errors : 3
timeouts : 0
totalTime : 885438
avgTimePerRequest : 8854.38
avgRequestsPerSecond : 0.008789442
   
the stats (copying fieldValueCache as well here, to show term
statistics):
   
name: fieldValueCache
class: org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=1,
 initialSize=10

Re: cache monitoring tools?

2011-12-15 Thread Justin Caratzas
Dmitry,

Thats beyond the scope of this thread, but Munin essentially runs
plugins which are essentially scripts that output graph configuration
and values when polled by the Munin server.  So it uses a plain text
protocol, so that the scripts can be written in any language.  Munin
then feeds this info into RRDtool, which displays the graph.  There are
some examples[1] of solr plugins that people have used to scrape the
stats.jsp page.

Justin

1. http://exchange.munin-monitoring.org/plugins/search?keyword=solr

Dmitry Kan dmitry@gmail.com writes:

 Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how
 about munin, what protocol / way it uses to accumulate stats? It wasn't
 obvious from their online documentation...

 On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas
 justin.carat...@gmail.comwrote:

 Dmitry,

 The only added stress that munin puts on each box is the 1 request per
 stat per 5 minutes to our admin stats handler.  Given that we get 25
 requests per second, this doesn't make much of a difference.  We don'tg
 have a sharded index (yet) as our index is only 2-3 GB, but we do have
 slave servers with replicated
 indexes that handle the queries, while our master handles
 updates/commits.

 Justin

 Dmitry Kan dmitry@gmail.com writes:

  Justin, in terms of the overhead, have you noticed if Munin puts much of
 it
  when used in production? In terms of the solr farm: how big is a shard's
  index (given you have sharded architecture).
 
  Dmitry
 
  On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
  justin.carat...@gmail.comwrote:
 
  At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
  great because writing a plugin for it so simple, and with Solr's
  statistics handler, we can track almost any solr stat we want.  It also
  comes with included plugins for load, file system stats, processes,
  etc.
 
  http://munin-monitoring.org/
 
  Justin
 
  Paul Libbrecht p...@hoplahup.net writes:
 
   Allow me to chim in and ask a generic question about monitoring tools
   for people close to developers: are any of the tools mentioned in this
   thread actually able to show graphs of loads, e.g. cache counts or CPU
   load, in parallel to a console log or to an http request log??
  
   I am working on such a tool currently but I have a bad feeling of
  reinventing the wheel.
  
   thanks in advance
  
   Paul
  
  
  
   Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
  
   Otis, Tomás: thanks for the great links!
  
   2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
  
   Hi Dimitry, I pointed to the wiki page to enable JMX, then you can
 use
  any
   tool that visualizes JMX stuff like Zabbix. See
  
  
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
  
   On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com
  wrote:
  
   The culprit seems to be the merger (frontend) SOLR. Talking to one
  shard
   directly takes substantially less time (1-2 sec).
  
   On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com
  wrote:
  
   Tomás: thanks. The page you gave didn't mention cache
 specifically,
  is
   there more documentation on this specifically? I have used
 solrmeter
   tool,
   it draws the cache diagrams, is there a similar tool, but which
 would
   use
   jmx directly and present the cache usage in runtime?
  
   pravesh:
   I have increased the size of filterCache, but the search hasn't
  become
   any
   faster, taking almost 9 sec on avg :(
  
   name: search
   class: org.apache.solr.handler.component.SearchHandler
   version: $Revision: 1052938 $
   description: Search using components:
  
  
  
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
  
   stats: handlerStart : 1323255147351
   requests : 100
   errors : 3
   timeouts : 0
   totalTime : 885438
   avgTimePerRequest : 8854.38
   avgRequestsPerSecond : 0.008789442
  
   the stats (copying fieldValueCache as well here, to show term
   statistics):
  
   name: fieldValueCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=1, initialSize=10,
   minSize=9000, acceptableSize=9500, cleanupThread=false)
   stats: lookups : 79
   hits : 77
   hitratio : 0.97
   inserts : 1
   evictions : 0
   size : 1
   warmupTime : 0
   cumulative_lookups : 79
   cumulative_hits : 77
   cumulative_hitratio : 0.97
   cumulative_inserts : 1
   cumulative_evictions : 0
   item_shingleContent_trigram :
  
  
  
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
   name: filterCache
   class

Re: cache monitoring tools?

2011-12-14 Thread Dmitry Kan
Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how
about munin, what protocol / way it uses to accumulate stats? It wasn't
obvious from their online documentation...

On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas
justin.carat...@gmail.comwrote:

 Dmitry,

 The only added stress that munin puts on each box is the 1 request per
 stat per 5 minutes to our admin stats handler.  Given that we get 25
 requests per second, this doesn't make much of a difference.  We don't
 have a sharded index (yet) as our index is only 2-3 GB, but we do have
 slave servers with replicated
 indexes that handle the queries, while our master handles
 updates/commits.

 Justin

 Dmitry Kan dmitry@gmail.com writes:

  Justin, in terms of the overhead, have you noticed if Munin puts much of
 it
  when used in production? In terms of the solr farm: how big is a shard's
  index (given you have sharded architecture).
 
  Dmitry
 
  On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
  justin.carat...@gmail.comwrote:
 
  At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
  great because writing a plugin for it so simple, and with Solr's
  statistics handler, we can track almost any solr stat we want.  It also
  comes with included plugins for load, file system stats, processes,
  etc.
 
  http://munin-monitoring.org/
 
  Justin
 
  Paul Libbrecht p...@hoplahup.net writes:
 
   Allow me to chim in and ask a generic question about monitoring tools
   for people close to developers: are any of the tools mentioned in this
   thread actually able to show graphs of loads, e.g. cache counts or CPU
   load, in parallel to a console log or to an http request log??
  
   I am working on such a tool currently but I have a bad feeling of
  reinventing the wheel.
  
   thanks in advance
  
   Paul
  
  
  
   Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
  
   Otis, Tomás: thanks for the great links!
  
   2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
  
   Hi Dimitry, I pointed to the wiki page to enable JMX, then you can
 use
  any
   tool that visualizes JMX stuff like Zabbix. See
  
  
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
  
   On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com
  wrote:
  
   The culprit seems to be the merger (frontend) SOLR. Talking to one
  shard
   directly takes substantially less time (1-2 sec).
  
   On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com
  wrote:
  
   Tomás: thanks. The page you gave didn't mention cache
 specifically,
  is
   there more documentation on this specifically? I have used
 solrmeter
   tool,
   it draws the cache diagrams, is there a similar tool, but which
 would
   use
   jmx directly and present the cache usage in runtime?
  
   pravesh:
   I have increased the size of filterCache, but the search hasn't
  become
   any
   faster, taking almost 9 sec on avg :(
  
   name: search
   class: org.apache.solr.handler.component.SearchHandler
   version: $Revision: 1052938 $
   description: Search using components:
  
  
  
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
  
   stats: handlerStart : 1323255147351
   requests : 100
   errors : 3
   timeouts : 0
   totalTime : 885438
   avgTimePerRequest : 8854.38
   avgRequestsPerSecond : 0.008789442
  
   the stats (copying fieldValueCache as well here, to show term
   statistics):
  
   name: fieldValueCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=1, initialSize=10,
   minSize=9000, acceptableSize=9500, cleanupThread=false)
   stats: lookups : 79
   hits : 77
   hitratio : 0.97
   inserts : 1
   evictions : 0
   size : 1
   warmupTime : 0
   cumulative_lookups : 79
   cumulative_hits : 77
   cumulative_hitratio : 0.97
   cumulative_inserts : 1
   cumulative_evictions : 0
   item_shingleContent_trigram :
  
  
  
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
   name: filterCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=153600,
 initialSize=4096,
   minSize=138240, acceptableSize=145920, cleanupThread=false)
   stats: lookups : 1082854
   hits : 940370
   hitratio : 0.86
   inserts : 142486
   evictions : 0
   size : 142486
   warmupTime : 0
   cumulative_lookups : 1082854
   cumulative_hits : 940370
   cumulative_hitratio : 0.86
   cumulative_inserts : 142486
   cumulative_evictions : 0
  
  
   index size: 3,25 GB
  
   Does anyone have some pointers to where to look at and optimize
 for
   query
   time

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan
Hoss, I can't see why Network IO is the issue as the shards and the front
end SOLR resided on the same server. I said resided, because I got rid of
the front end (which according to my measurements, was taking at least as
much time for merging as it took to find the actual data in the shards) and
shards. Now I have only one shard having all the data. Filter cache tuning
also helped to reduce the amount of evictions to a minimum.

Dmitry

On Fri, Dec 9, 2011 at 10:42 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : The culprit seems to be the merger (frontend) SOLR. Talking to one shard
 : directly takes substantially less time (1-2 sec).
 ...
 :facet.limit=50

 Your probably most likeley has very little to do with your caches at all
 -- a facet.limit that high requires sending a very large amount of data
 over the wire, multiplied by the number of shards, multipled by some
 constant (i think it's 2 but it might be higher) in order to over
 request facet constriant counts from each shard to aggregate them.

 the dominant factor in the slow speed you are seeing is most likeley
 Network IO between the shards.



 -Hoss




-- 
Regards,

Dmitry Kan


Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan
Paul, have you checked solrmeter and zabbix?

Dmitry

On Fri, Dec 9, 2011 at 11:16 PM, Paul Libbrecht p...@hoplahup.net wrote:

 Allow me to chim in and ask a generic question about monitoring tools for
 people close to developers: are any of the tools mentioned in this thread
 actually able to show graphs of loads, e.g. cache counts or CPU load, in
 parallel to a console log or to an http request log??

 I am working on such a tool currently but I have a bad feeling of
 reinventing the wheel.

 thanks in advance

 Paul



 Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :

  Otis, Tomás: thanks for the great links!
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use
 any
  tool that visualizes JMX stuff like Zabbix. See
 
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
 
  On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
  The culprit seems to be the merger (frontend) SOLR. Talking to one
 shard
  directly takes substantially less time (1-2 sec).
 
  On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com
 wrote:
 
  Tomás: thanks. The page you gave didn't mention cache specifically, is
  there more documentation on this specifically? I have used solrmeter
  tool,
  it draws the cache diagrams, is there a similar tool, but which would
  use
  jmx directly and present the cache usage in runtime?
 
  pravesh:
  I have increased the size of filterCache, but the search hasn't become
  any
  faster, taking almost 9 sec on avg :(
 
  name: search
  class: org.apache.solr.handler.component.SearchHandler
  version: $Revision: 1052938 $
  description: Search using components:
 
 
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
  stats: handlerStart : 1323255147351
  requests : 100
  errors : 3
  timeouts : 0
  totalTime : 885438
  avgTimePerRequest : 8854.38
  avgRequestsPerSecond : 0.008789442
 
  the stats (copying fieldValueCache as well here, to show term
  statistics):
 
  name: fieldValueCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=1, initialSize=10,
  minSize=9000, acceptableSize=9500, cleanupThread=false)
  stats: lookups : 79
  hits : 77
  hitratio : 0.97
  inserts : 1
  evictions : 0
  size : 1
  warmupTime : 0
  cumulative_lookups : 79
  cumulative_hits : 77
  cumulative_hitratio : 0.97
  cumulative_inserts : 1
  cumulative_evictions : 0
  item_shingleContent_trigram :
 
 
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
  name: filterCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
  minSize=138240, acceptableSize=145920, cleanupThread=false)
  stats: lookups : 1082854
  hits : 940370
  hitratio : 0.86
  inserts : 142486
  evictions : 0
  size : 142486
  warmupTime : 0
  cumulative_lookups : 1082854
  cumulative_hits : 940370
  cumulative_hitratio : 0.86
  cumulative_inserts : 142486
  cumulative_evictions : 0
 
 
  index size: 3,25 GB
 
  Does anyone have some pointers to where to look at and optimize for
  query
  time?
 
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, cache information is exposed via JMX, so you should be
  able
  to
  monitor that information with any JMX tool. See
  http://wiki.apache.org/solr/SolrJmx
 
  On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
  wrote:
 
  Yes, we do require that much.
  Ok, thanks, I will try increasing the maxsize.
 
  On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
  wrote:
 
  facet.limit=50
  your facet.limit seems too high. Do you actually require this
  much?
 
  Since there a lot of evictions from filtercache, so, increase the
  maxsize
  value to your acceptable limit.
 
  Regards
  Pravesh
 
  --
  View this message in context:
 
 
 
 
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Regards,
 
  Dmitry Kan
 
 
 
 
 
  --
  Regards,
 
  Dmitry Kan
 
 
 
 
  --
  Regards,
 
  Dmitry Kan
 
 
 
 
 
  --
  Regards,
 
  Dmitry Kan




-- 
Regards,

Dmitry Kan


Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan
Justin, in terms of the overhead, have you noticed if Munin puts much of it
when used in production? In terms of the solr farm: how big is a shard's
index (given you have sharded architecture).

Dmitry

On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
justin.carat...@gmail.comwrote:

 At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
 great because writing a plugin for it so simple, and with Solr's
 statistics handler, we can track almost any solr stat we want.  It also
 comes with included plugins for load, file system stats, processes,
 etc.

 http://munin-monitoring.org/

 Justin

 Paul Libbrecht p...@hoplahup.net writes:

  Allow me to chim in and ask a generic question about monitoring tools
  for people close to developers: are any of the tools mentioned in this
  thread actually able to show graphs of loads, e.g. cache counts or CPU
  load, in parallel to a console log or to an http request log??
 
  I am working on such a tool currently but I have a bad feeling of
 reinventing the wheel.
 
  thanks in advance
 
  Paul
 
 
 
  Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
 
  Otis, Tomás: thanks for the great links!
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use
 any
  tool that visualizes JMX stuff like Zabbix. See
 
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
 
  On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
  The culprit seems to be the merger (frontend) SOLR. Talking to one
 shard
  directly takes substantially less time (1-2 sec).
 
  On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com
 wrote:
 
  Tomás: thanks. The page you gave didn't mention cache specifically,
 is
  there more documentation on this specifically? I have used solrmeter
  tool,
  it draws the cache diagrams, is there a similar tool, but which would
  use
  jmx directly and present the cache usage in runtime?
 
  pravesh:
  I have increased the size of filterCache, but the search hasn't
 become
  any
  faster, taking almost 9 sec on avg :(
 
  name: search
  class: org.apache.solr.handler.component.SearchHandler
  version: $Revision: 1052938 $
  description: Search using components:
 
 
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
  stats: handlerStart : 1323255147351
  requests : 100
  errors : 3
  timeouts : 0
  totalTime : 885438
  avgTimePerRequest : 8854.38
  avgRequestsPerSecond : 0.008789442
 
  the stats (copying fieldValueCache as well here, to show term
  statistics):
 
  name: fieldValueCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=1, initialSize=10,
  minSize=9000, acceptableSize=9500, cleanupThread=false)
  stats: lookups : 79
  hits : 77
  hitratio : 0.97
  inserts : 1
  evictions : 0
  size : 1
  warmupTime : 0
  cumulative_lookups : 79
  cumulative_hits : 77
  cumulative_hitratio : 0.97
  cumulative_inserts : 1
  cumulative_evictions : 0
  item_shingleContent_trigram :
 
 
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
  name: filterCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
  minSize=138240, acceptableSize=145920, cleanupThread=false)
  stats: lookups : 1082854
  hits : 940370
  hitratio : 0.86
  inserts : 142486
  evictions : 0
  size : 142486
  warmupTime : 0
  cumulative_lookups : 1082854
  cumulative_hits : 940370
  cumulative_hitratio : 0.86
  cumulative_inserts : 142486
  cumulative_evictions : 0
 
 
  index size: 3,25 GB
 
  Does anyone have some pointers to where to look at and optimize for
  query
  time?
 
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, cache information is exposed via JMX, so you should be
  able
  to
  monitor that information with any JMX tool. See
  http://wiki.apache.org/solr/SolrJmx
 
  On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
  wrote:
 
  Yes, we do require that much.
  Ok, thanks, I will try increasing the maxsize.
 
  On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
  wrote:
 
  facet.limit=50
  your facet.limit seems too high. Do you actually require this
  much?
 
  Since there a lot of evictions from filtercache, so, increase the
  maxsize
  value to your acceptable limit.
 
  Regards
  Pravesh
 
  --
  View this message in context:
 
 
 
 
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
  Sent from the Solr - User mailing list archive

Re: cache monitoring tools?

2011-12-12 Thread Justin Caratzas
Dmitry,

The only added stress that munin puts on each box is the 1 request per
stat per 5 minutes to our admin stats handler.  Given that we get 25
requests per second, this doesn't make much of a difference.  We don't
have a sharded index (yet) as our index is only 2-3 GB, but we do have slave 
servers with replicated
indexes that handle the queries, while our master handles
updates/commits.

Justin

Dmitry Kan dmitry@gmail.com writes:

 Justin, in terms of the overhead, have you noticed if Munin puts much of it
 when used in production? In terms of the solr farm: how big is a shard's
 index (given you have sharded architecture).

 Dmitry

 On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
 justin.carat...@gmail.comwrote:

 At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
 great because writing a plugin for it so simple, and with Solr's
 statistics handler, we can track almost any solr stat we want.  It also
 comes with included plugins for load, file system stats, processes,
 etc.

 http://munin-monitoring.org/

 Justin

 Paul Libbrecht p...@hoplahup.net writes:

  Allow me to chim in and ask a generic question about monitoring tools
  for people close to developers: are any of the tools mentioned in this
  thread actually able to show graphs of loads, e.g. cache counts or CPU
  load, in parallel to a console log or to an http request log??
 
  I am working on such a tool currently but I have a bad feeling of
 reinventing the wheel.
 
  thanks in advance
 
  Paul
 
 
 
  Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
 
  Otis, Tomás: thanks for the great links!
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use
 any
  tool that visualizes JMX stuff like Zabbix. See
 
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
 
  On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
  The culprit seems to be the merger (frontend) SOLR. Talking to one
 shard
  directly takes substantially less time (1-2 sec).
 
  On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com
 wrote:
 
  Tomás: thanks. The page you gave didn't mention cache specifically,
 is
  there more documentation on this specifically? I have used solrmeter
  tool,
  it draws the cache diagrams, is there a similar tool, but which would
  use
  jmx directly and present the cache usage in runtime?
 
  pravesh:
  I have increased the size of filterCache, but the search hasn't
 become
  any
  faster, taking almost 9 sec on avg :(
 
  name: search
  class: org.apache.solr.handler.component.SearchHandler
  version: $Revision: 1052938 $
  description: Search using components:
 
 
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
  stats: handlerStart : 1323255147351
  requests : 100
  errors : 3
  timeouts : 0
  totalTime : 885438
  avgTimePerRequest : 8854.38
  avgRequestsPerSecond : 0.008789442
 
  the stats (copying fieldValueCache as well here, to show term
  statistics):
 
  name: fieldValueCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=1, initialSize=10,
  minSize=9000, acceptableSize=9500, cleanupThread=false)
  stats: lookups : 79
  hits : 77
  hitratio : 0.97
  inserts : 1
  evictions : 0
  size : 1
  warmupTime : 0
  cumulative_lookups : 79
  cumulative_hits : 77
  cumulative_hitratio : 0.97
  cumulative_inserts : 1
  cumulative_evictions : 0
  item_shingleContent_trigram :
 
 
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
  name: filterCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
  minSize=138240, acceptableSize=145920, cleanupThread=false)
  stats: lookups : 1082854
  hits : 940370
  hitratio : 0.86
  inserts : 142486
  evictions : 0
  size : 142486
  warmupTime : 0
  cumulative_lookups : 1082854
  cumulative_hits : 940370
  cumulative_hitratio : 0.86
  cumulative_inserts : 142486
  cumulative_evictions : 0
 
 
  index size: 3,25 GB
 
  Does anyone have some pointers to where to look at and optimize for
  query
  time?
 
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, cache information is exposed via JMX, so you should be
  able
  to
  monitor that information with any JMX tool. See
  http://wiki.apache.org/solr/SolrJmx
 
  On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
  wrote:
 
  Yes, we do require that much.
  Ok, thanks, I will try increasing the maxsize.
 
  On Wed, Dec 7, 2011 at 10:56 AM

Re: cache monitoring tools?

2011-12-11 Thread Justin Caratzas
At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
great because writing a plugin for it so simple, and with Solr's
statistics handler, we can track almost any solr stat we want.  It also
comes with included plugins for load, file system stats, processes,
etc.

http://munin-monitoring.org/

Justin

Paul Libbrecht p...@hoplahup.net writes:

 Allow me to chim in and ask a generic question about monitoring tools
 for people close to developers: are any of the tools mentioned in this
 thread actually able to show graphs of loads, e.g. cache counts or CPU
 load, in parallel to a console log or to an http request log??

 I am working on such a tool currently but I have a bad feeling of reinventing 
 the wheel.

 thanks in advance

 Paul



 Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :

 Otis, Tomás: thanks for the great links!
 
 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
 Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
 tool that visualizes JMX stuff like Zabbix. See
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
 
 On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote:
 
 The culprit seems to be the merger (frontend) SOLR. Talking to one shard
 directly takes substantially less time (1-2 sec).
 
 On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote:
 
 Tomás: thanks. The page you gave didn't mention cache specifically, is
 there more documentation on this specifically? I have used solrmeter
 tool,
 it draws the cache diagrams, is there a similar tool, but which would
 use
 jmx directly and present the cache usage in runtime?
 
 pravesh:
 I have increased the size of filterCache, but the search hasn't become
 any
 faster, taking almost 9 sec on avg :(
 
 name: search
 class: org.apache.solr.handler.component.SearchHandler
 version: $Revision: 1052938 $
 description: Search using components:
 
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
 stats: handlerStart : 1323255147351
 requests : 100
 errors : 3
 timeouts : 0
 totalTime : 885438
 avgTimePerRequest : 8854.38
 avgRequestsPerSecond : 0.008789442
 
 the stats (copying fieldValueCache as well here, to show term
 statistics):
 
 name: fieldValueCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 79
 hits : 77
 hitratio : 0.97
 inserts : 1
 evictions : 0
 size : 1
 warmupTime : 0
 cumulative_lookups : 79
 cumulative_hits : 77
 cumulative_hitratio : 0.97
 cumulative_inserts : 1
 cumulative_evictions : 0
 item_shingleContent_trigram :
 
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
 name: filterCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
 minSize=138240, acceptableSize=145920, cleanupThread=false)
 stats: lookups : 1082854
 hits : 940370
 hitratio : 0.86
 inserts : 142486
 evictions : 0
 size : 142486
 warmupTime : 0
 cumulative_lookups : 1082854
 cumulative_hits : 940370
 cumulative_hitratio : 0.86
 cumulative_inserts : 142486
 cumulative_evictions : 0
 
 
 index size: 3,25 GB
 
 Does anyone have some pointers to where to look at and optimize for
 query
 time?
 
 
 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
 Hi Dimitry, cache information is exposed via JMX, so you should be
 able
 to
 monitor that information with any JMX tool. See
 http://wiki.apache.org/solr/SolrJmx
 
 On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
 Yes, we do require that much.
 Ok, thanks, I will try increasing the maxsize.
 
 On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
 wrote:
 
 facet.limit=50
 your facet.limit seems too high. Do you actually require this
 much?
 
 Since there a lot of evictions from filtercache, so, increase the
 maxsize
 value to your acceptable limit.
 
 Regards
 Pravesh
 
 --
 View this message in context:
 
 
 
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 
 -- 
 Regards,
 
 Dmitry Kan


Re: cache monitoring tools?

2011-12-11 Thread Paul Libbrecht
Justin,

I am not sure this answers the question: is there a graph view (of some 
measurements) which can be synched to one or several logs? I'd like to click on 
a spike of CPU to see the log around the time of that spike.

Does munin or any other do that?

paul


Le 11 déc. 2011 à 17:39, Justin Caratzas a écrit :

 At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
 great because writing a plugin for it so simple, and with Solr's
 statistics handler, we can track almost any solr stat we want.  It also
 comes with included plugins for load, file system stats, processes,
 etc.
 
 http://munin-monitoring.org/
 
 Justin
 
 Paul Libbrecht p...@hoplahup.net writes:
 
 Allow me to chim in and ask a generic question about monitoring tools
 for people close to developers: are any of the tools mentioned in this
 thread actually able to show graphs of loads, e.g. cache counts or CPU
 load, in parallel to a console log or to an http request log??
 
 I am working on such a tool currently but I have a bad feeling of 
 reinventing the wheel.
 
 thanks in advance
 
 Paul
 
 
 
 Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
 
 Otis, Tomás: thanks for the great links!
 
 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
 Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
 tool that visualizes JMX stuff like Zabbix. See
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
 
 On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote:
 
 The culprit seems to be the merger (frontend) SOLR. Talking to one shard
 directly takes substantially less time (1-2 sec).
 
 On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote:
 
 Tomás: thanks. The page you gave didn't mention cache specifically, is
 there more documentation on this specifically? I have used solrmeter
 tool,
 it draws the cache diagrams, is there a similar tool, but which would
 use
 jmx directly and present the cache usage in runtime?
 
 pravesh:
 I have increased the size of filterCache, but the search hasn't become
 any
 faster, taking almost 9 sec on avg :(
 
 name: search
 class: org.apache.solr.handler.component.SearchHandler
 version: $Revision: 1052938 $
 description: Search using components:
 
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
 stats: handlerStart : 1323255147351
 requests : 100
 errors : 3
 timeouts : 0
 totalTime : 885438
 avgTimePerRequest : 8854.38
 avgRequestsPerSecond : 0.008789442
 
 the stats (copying fieldValueCache as well here, to show term
 statistics):
 
 name: fieldValueCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 79
 hits : 77
 hitratio : 0.97
 inserts : 1
 evictions : 0
 size : 1
 warmupTime : 0
 cumulative_lookups : 79
 cumulative_hits : 77
 cumulative_hitratio : 0.97
 cumulative_inserts : 1
 cumulative_evictions : 0
 item_shingleContent_trigram :
 
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
 name: filterCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
 minSize=138240, acceptableSize=145920, cleanupThread=false)
 stats: lookups : 1082854
 hits : 940370
 hitratio : 0.86
 inserts : 142486
 evictions : 0
 size : 142486
 warmupTime : 0
 cumulative_lookups : 1082854
 cumulative_hits : 940370
 cumulative_hitratio : 0.86
 cumulative_inserts : 142486
 cumulative_evictions : 0
 
 
 index size: 3,25 GB
 
 Does anyone have some pointers to where to look at and optimize for
 query
 time?
 
 
 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
 Hi Dimitry, cache information is exposed via JMX, so you should be
 able
 to
 monitor that information with any JMX tool. See
 http://wiki.apache.org/solr/SolrJmx
 
 On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
 Yes, we do require that much.
 Ok, thanks, I will try increasing the maxsize.
 
 On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
 wrote:
 
 facet.limit=50
 your facet.limit seems too high. Do you actually require this
 much?
 
 Since there a lot of evictions from filtercache, so, increase the
 maxsize
 value to your acceptable limit.
 
 Regards
 Pravesh
 
 --
 View this message in context:
 
 
 
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 
 --
 Regards

Re: cache monitoring tools?

2011-12-09 Thread Chris Hostetter

: The culprit seems to be the merger (frontend) SOLR. Talking to one shard
: directly takes substantially less time (1-2 sec).
...
:facet.limit=50

Your probably most likeley has very little to do with your caches at all 
-- a facet.limit that high requires sending a very large amount of data 
over the wire, multiplied by the number of shards, multipled by some 
constant (i think it's 2 but it might be higher) in order to over 
request facet constriant counts from each shard to aggregate them.

the dominant factor in the slow speed you are seeing is most likeley 
Network IO between the shards.



-Hoss


Re: cache monitoring tools?

2011-12-09 Thread Paul Libbrecht
Allow me to chim in and ask a generic question about monitoring tools for 
people close to developers: are any of the tools mentioned in this thread 
actually able to show graphs of loads, e.g. cache counts or CPU load, in 
parallel to a console log or to an http request log??

I am working on such a tool currently but I have a bad feeling of reinventing 
the wheel.

thanks in advance

Paul



Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :

 Otis, Tomás: thanks for the great links!
 
 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
 Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
 tool that visualizes JMX stuff like Zabbix. See
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
 
 On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote:
 
 The culprit seems to be the merger (frontend) SOLR. Talking to one shard
 directly takes substantially less time (1-2 sec).
 
 On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote:
 
 Tomás: thanks. The page you gave didn't mention cache specifically, is
 there more documentation on this specifically? I have used solrmeter
 tool,
 it draws the cache diagrams, is there a similar tool, but which would
 use
 jmx directly and present the cache usage in runtime?
 
 pravesh:
 I have increased the size of filterCache, but the search hasn't become
 any
 faster, taking almost 9 sec on avg :(
 
 name: search
 class: org.apache.solr.handler.component.SearchHandler
 version: $Revision: 1052938 $
 description: Search using components:
 
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
 stats: handlerStart : 1323255147351
 requests : 100
 errors : 3
 timeouts : 0
 totalTime : 885438
 avgTimePerRequest : 8854.38
 avgRequestsPerSecond : 0.008789442
 
 the stats (copying fieldValueCache as well here, to show term
 statistics):
 
 name: fieldValueCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 79
 hits : 77
 hitratio : 0.97
 inserts : 1
 evictions : 0
 size : 1
 warmupTime : 0
 cumulative_lookups : 79
 cumulative_hits : 77
 cumulative_hitratio : 0.97
 cumulative_inserts : 1
 cumulative_evictions : 0
 item_shingleContent_trigram :
 
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
 name: filterCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
 minSize=138240, acceptableSize=145920, cleanupThread=false)
 stats: lookups : 1082854
 hits : 940370
 hitratio : 0.86
 inserts : 142486
 evictions : 0
 size : 142486
 warmupTime : 0
 cumulative_lookups : 1082854
 cumulative_hits : 940370
 cumulative_hitratio : 0.86
 cumulative_inserts : 142486
 cumulative_evictions : 0
 
 
 index size: 3,25 GB
 
 Does anyone have some pointers to where to look at and optimize for
 query
 time?
 
 
 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
 Hi Dimitry, cache information is exposed via JMX, so you should be
 able
 to
 monitor that information with any JMX tool. See
 http://wiki.apache.org/solr/SolrJmx
 
 On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
 Yes, we do require that much.
 Ok, thanks, I will try increasing the maxsize.
 
 On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
 wrote:
 
 facet.limit=50
 your facet.limit seems too high. Do you actually require this
 much?
 
 Since there a lot of evictions from filtercache, so, increase the
 maxsize
 value to your acceptable limit.
 
 Regards
 Pravesh
 
 --
 View this message in context:
 
 
 
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 --
 Regards,
 
 Dmitry Kan
 
 
 
 
 
 -- 
 Regards,
 
 Dmitry Kan



Re: cache monitoring tools?

2011-12-08 Thread Bernd Fehling

Hi Otis,

I can't find the download for the free SPM.
What Hardware and OS do I need for installing SPM to monitor my servers?

Regards
Bernd

Am 07.12.2011 18:47, schrieb Otis Gospodnetic:

Hi Dmitry,

You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system 
info, etc.)
PLUS it's currently 100% free.

http://sematext.com/spm/solr-performance-monitoring/index.html


We use it with our clients on a regular basis and it helps us a TON - we just 
helped a very popular mobile app company improve Solr performance by a few 
orders of magnitude (including filter tuning) with the help of SPM.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: Dmitry Kandmitry@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, December 7, 2011 2:13 AM
Subject: cache monitoring tools?

Hello list,

We've noticed quite huge strain on the filterCache in facet queries against
trigram fields (see schema in the end of this e-mail). The typical query
contains some keywords in the q parameter and boolean filter query on other
solr fields. It is also facet query, the facet field is of
type shingle_text_trigram (see schema) and facet.limit=50.


Questions: are there some tools (except for solrmeter) and/or approaches to
monitor / profile the load on caches, which would help to derive better
tuning parameters?

Can you recommend checking config parameters of other components but caches?

BTW, this has become much faster compared to solr 1.4 where we had to a lot
of optimizations on schema level (e.g. by making a number of stored fields
non-stored)

Here are the relevant stats from admin (SOLR 3.4):

description: Concurrent LRU Cache(maxSize=1, initialSize=10,
minSize=9000, acceptableSize=9500, cleanupThread=false)
stats: lookups : 93
hits : 90
hitratio : 0.96
inserts : 1
evictions : 0
size : 1
warmupTime : 0
cumulative_lookups : 93
cumulative_hits : 90
cumulative_hitratio : 0.96
cumulative_inserts : 1
cumulative_evictions : 0
item_shingleContent_trigram :
{field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91}
name: filterCache
class: org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=512, initialSize=512,
minSize=460, acceptableSize=486, cleanupThread=false)
stats: lookups : 1003486
hits : 2809
hitratio : 0.00
inserts : 1000694
evictions : 1000221
size : 473
warmupTime : 0
cumulative_lookups : 1003486
cumulative_hits : 2809
cumulative_hitratio : 0.00
cumulative_inserts : 1000694
cumulative_evictions : 1000221


schema excerpt:

fieldType name=shingle_text_trigram class=solr.TextField
positionIncrementGap=100
analyzer
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ShingleFilterFactory maxShingleSize=3
outputUnigrams=true/
/analyzer
/fieldType

--
Regards,

Dmitry Kan





Re: cache monitoring tools?

2011-12-08 Thread Otis Gospodnetic
Hi Bernd,

Check this:
SPM for Solr is the enterprise-class, cloud-based, System/OS and Solr 
Performance Monitoring SaaS.


So it's a SaaS -  you simply sign up for it.  During the signup you'll get to 
download a small agent that works on RedHat, CentOS, Debian, Ubuntu, and maybe 
other OSes.

If you have any more SPM questions, it may be best to email me directly.  For 
example, if you are only interested in SPM if it runs in your datacenter, 
please let me know.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/





 From: Bernd Fehling bernd.fehl...@uni-bielefeld.de
To: solr-user@lucene.apache.org 
Sent: Thursday, December 8, 2011 4:04 AM
Subject: Re: cache monitoring tools?
 
Hi Otis,

I can't find the download for the free SPM.
What Hardware and OS do I need for installing SPM to monitor my servers?

Regards
Bernd

Am 07.12.2011 18:47, schrieb Otis Gospodnetic:
 Hi Dmitry,

 You should use SPM for Solr - it exposes all Solr metrics and more (JVM, 
 system info, etc.)
 PLUS it's currently 100% free.

 http://sematext.com/spm/solr-performance-monitoring/index.html


 We use it with our clients on a regular basis and it helps us a TON - we 
 just helped a very popular mobile app company improve Solr performance by a 
 few orders of magnitude (including filter tuning) with the help of SPM.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Dmitry Kandmitry@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 7, 2011 2:13 AM
 Subject: cache monitoring tools?

 Hello list,

 We've noticed quite huge strain on the filterCache in facet queries against
 trigram fields (see schema in the end of this e-mail). The typical query
 contains some keywords in the q parameter and boolean filter query on other
 solr fields. It is also facet query, the facet field is of
 type shingle_text_trigram (see schema) and facet.limit=50.


 Questions: are there some tools (except for solrmeter) and/or approaches to
 monitor / profile the load on caches, which would help to derive better
 tuning parameters?

 Can you recommend checking config parameters of other components but caches?

 BTW, this has become much faster compared to solr 1.4 where we had to a lot
 of optimizations on schema level (e.g. by making a number of stored fields
 non-stored)

 Here are the relevant stats from admin (SOLR 3.4):

 description: Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 93
 hits : 90
 hitratio : 0.96
 inserts : 1
 evictions : 0
 size : 1
 warmupTime : 0
 cumulative_lookups : 93
 cumulative_hits : 90
 cumulative_hitratio : 0.96
 cumulative_inserts : 1
 cumulative_evictions : 0
 item_shingleContent_trigram :
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91}
 name: filterCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=512, initialSize=512,
 minSize=460, acceptableSize=486, cleanupThread=false)
 stats: lookups : 1003486
 hits : 2809
 hitratio : 0.00
 inserts : 1000694
 evictions : 1000221
 size : 473
 warmupTime : 0
 cumulative_lookups : 1003486
 cumulative_hits : 2809
 cumulative_hitratio : 0.00
 cumulative_inserts : 1000694
 cumulative_evictions : 1000221


 schema excerpt:

 fieldType name=shingle_text_trigram class=solr.TextField
 positionIncrementGap=100
     analyzer
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=true/
     /analyzer
 /fieldType

 --
 Regards,

 Dmitry Kan







Re: cache monitoring tools?

2011-12-07 Thread pravesh
facet.limit=50
your facet.limit seems too high. Do you actually require this much?

Since there a lot of evictions from filtercache, so, increase the maxsize
value to your acceptable limit.

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
Yes, we do require that much.
Ok, thanks, I will try increasing the maxsize.

On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote:

 facet.limit=50
 your facet.limit seems too high. Do you actually require this much?

 Since there a lot of evictions from filtercache, so, increase the maxsize
 value to your acceptable limit.

 Regards
 Pravesh

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan


Re: cache monitoring tools?

2011-12-07 Thread Tomás Fernández Löbbe
Hi Dimitry, cache information is exposed via JMX, so you should be able to
monitor that information with any JMX tool. See
http://wiki.apache.org/solr/SolrJmx

On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote:

 Yes, we do require that much.
 Ok, thanks, I will try increasing the maxsize.

 On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote:

  facet.limit=50
  your facet.limit seems too high. Do you actually require this much?
 
  Since there a lot of evictions from filtercache, so, increase the maxsize
  value to your acceptable limit.
 
  Regards
  Pravesh
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Regards,

 Dmitry Kan



Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
Tomás: thanks. The page you gave didn't mention cache specifically, is
there more documentation on this specifically? I have used solrmeter tool,
it draws the cache diagrams, is there a similar tool, but which would use
jmx directly and present the cache usage in runtime?

pravesh:
I have increased the size of filterCache, but the search hasn't become any
faster, taking almost 9 sec on avg :(

name: search
class: org.apache.solr.handler.component.SearchHandler
version: $Revision: 1052938 $
description: Search using components:
org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,

stats: handlerStart : 1323255147351
requests : 100
errors : 3
timeouts : 0
totalTime : 885438
avgTimePerRequest : 8854.38
avgRequestsPerSecond : 0.008789442

the stats (copying fieldValueCache as well here, to show term statistics):

name: fieldValueCache
class: org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=1, initialSize=10,
minSize=9000, acceptableSize=9500, cleanupThread=false)
stats: lookups : 79
hits : 77
hitratio : 0.97
inserts : 1
evictions : 0
size : 1
warmupTime : 0
cumulative_lookups : 79
cumulative_hits : 77
cumulative_hitratio : 0.97
cumulative_inserts : 1
cumulative_evictions : 0
item_shingleContent_trigram :
{field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
name: filterCache
class: org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
minSize=138240, acceptableSize=145920, cleanupThread=false)
stats: lookups : 1082854
hits : 940370
hitratio : 0.86
inserts : 142486
evictions : 0
size : 142486
warmupTime : 0
cumulative_lookups : 1082854
cumulative_hits : 940370
cumulative_hitratio : 0.86
cumulative_inserts : 142486
cumulative_evictions : 0


index size: 3,25 GB

Does anyone have some pointers to where to look at and optimize for query
time?


2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com

 Hi Dimitry, cache information is exposed via JMX, so you should be able to
 monitor that information with any JMX tool. See
 http://wiki.apache.org/solr/SolrJmx

 On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote:

  Yes, we do require that much.
  Ok, thanks, I will try increasing the maxsize.
 
  On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com wrote:
 
   facet.limit=50
   your facet.limit seems too high. Do you actually require this much?
  
   Since there a lot of evictions from filtercache, so, increase the
 maxsize
   value to your acceptable limit.
  
   Regards
   Pravesh
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
 
  --
  Regards,
 
  Dmitry Kan
 




-- 
Regards,

Dmitry Kan


Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
The culprit seems to be the merger (frontend) SOLR. Talking to one shard
directly takes substantially less time (1-2 sec).

On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote:

 Tomás: thanks. The page you gave didn't mention cache specifically, is
 there more documentation on this specifically? I have used solrmeter tool,
 it draws the cache diagrams, is there a similar tool, but which would use
 jmx directly and present the cache usage in runtime?

 pravesh:
 I have increased the size of filterCache, but the search hasn't become any
 faster, taking almost 9 sec on avg :(

 name: search
 class: org.apache.solr.handler.component.SearchHandler
 version: $Revision: 1052938 $
 description: Search using components:
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,

 stats: handlerStart : 1323255147351
 requests : 100
 errors : 3
 timeouts : 0
 totalTime : 885438
 avgTimePerRequest : 8854.38
 avgRequestsPerSecond : 0.008789442

 the stats (copying fieldValueCache as well here, to show term statistics):

 name: fieldValueCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 79
 hits : 77
 hitratio : 0.97
 inserts : 1
 evictions : 0
 size : 1
 warmupTime : 0
 cumulative_lookups : 79
 cumulative_hits : 77
 cumulative_hitratio : 0.97
 cumulative_inserts : 1
 cumulative_evictions : 0
 item_shingleContent_trigram :
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
  name: filterCache
 class: org.apache.solr.search.FastLRUCache
 version: 1.0
 description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
 minSize=138240, acceptableSize=145920, cleanupThread=false)
 stats: lookups : 1082854
 hits : 940370
 hitratio : 0.86
 inserts : 142486
 evictions : 0
 size : 142486
 warmupTime : 0
 cumulative_lookups : 1082854
 cumulative_hits : 940370
 cumulative_hitratio : 0.86
 cumulative_inserts : 142486
 cumulative_evictions : 0


 index size: 3,25 GB

 Does anyone have some pointers to where to look at and optimize for query
 time?


 2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com

 Hi Dimitry, cache information is exposed via JMX, so you should be able to
 monitor that information with any JMX tool. See
 http://wiki.apache.org/solr/SolrJmx

 On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com wrote:

  Yes, we do require that much.
  Ok, thanks, I will try increasing the maxsize.
 
  On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
 wrote:
 
   facet.limit=50
   your facet.limit seems too high. Do you actually require this much?
  
   Since there a lot of evictions from filtercache, so, increase the
 maxsize
   value to your acceptable limit.
  
   Regards
   Pravesh
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
 
  --
  Regards,
 
  Dmitry Kan
 




 --
 Regards,

 Dmitry Kan




-- 
Regards,

Dmitry Kan


Re: cache monitoring tools?

2011-12-07 Thread Otis Gospodnetic
Hi Dmitry,

You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system 
info, etc.)
PLUS it's currently 100% free.

http://sematext.com/spm/solr-performance-monitoring/index.html


We use it with our clients on a regular basis and it helps us a TON - we just 
helped a very popular mobile app company improve Solr performance by a few 
orders of magnitude (including filter tuning) with the help of SPM.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



 From: Dmitry Kan dmitry@gmail.com
To: solr-user@lucene.apache.org 
Sent: Wednesday, December 7, 2011 2:13 AM
Subject: cache monitoring tools?
 
Hello list,

We've noticed quite huge strain on the filterCache in facet queries against
trigram fields (see schema in the end of this e-mail). The typical query
contains some keywords in the q parameter and boolean filter query on other
solr fields. It is also facet query, the facet field is of
type shingle_text_trigram (see schema) and facet.limit=50.


Questions: are there some tools (except for solrmeter) and/or approaches to
monitor / profile the load on caches, which would help to derive better
tuning parameters?

Can you recommend checking config parameters of other components but caches?

BTW, this has become much faster compared to solr 1.4 where we had to a lot
of optimizations on schema level (e.g. by making a number of stored fields
non-stored)

Here are the relevant stats from admin (SOLR 3.4):

description: Concurrent LRU Cache(maxSize=1, initialSize=10,
minSize=9000, acceptableSize=9500, cleanupThread=false)
stats: lookups : 93
hits : 90
hitratio : 0.96
inserts : 1
evictions : 0
size : 1
warmupTime : 0
cumulative_lookups : 93
cumulative_hits : 90
cumulative_hitratio : 0.96
cumulative_inserts : 1
cumulative_evictions : 0
item_shingleContent_trigram :
{field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91}
name: filterCache
class: org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=512, initialSize=512,
minSize=460, acceptableSize=486, cleanupThread=false)
stats: lookups : 1003486
hits : 2809
hitratio : 0.00
inserts : 1000694
evictions : 1000221
size : 473
warmupTime : 0
cumulative_lookups : 1003486
cumulative_hits : 2809
cumulative_hitratio : 0.00
cumulative_inserts : 1000694
cumulative_evictions : 1000221


schema excerpt:

fieldType name=shingle_text_trigram class=solr.TextField
positionIncrementGap=100
   analyzer
      tokenizer class=solr.StandardTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=solr.ShingleFilterFactory maxShingleSize=3
outputUnigrams=true/
   /analyzer
/fieldType

-- 
Regards,

Dmitry Kan




Re: cache monitoring tools?

2011-12-07 Thread Tomás Fernández Löbbe
Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
tool that visualizes JMX stuff like Zabbix. See
http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/

On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote:

 The culprit seems to be the merger (frontend) SOLR. Talking to one shard
 directly takes substantially less time (1-2 sec).

 On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote:

  Tomás: thanks. The page you gave didn't mention cache specifically, is
  there more documentation on this specifically? I have used solrmeter
 tool,
  it draws the cache diagrams, is there a similar tool, but which would use
  jmx directly and present the cache usage in runtime?
 
  pravesh:
  I have increased the size of filterCache, but the search hasn't become
 any
  faster, taking almost 9 sec on avg :(
 
  name: search
  class: org.apache.solr.handler.component.SearchHandler
  version: $Revision: 1052938 $
  description: Search using components:
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
 
  stats: handlerStart : 1323255147351
  requests : 100
  errors : 3
  timeouts : 0
  totalTime : 885438
  avgTimePerRequest : 8854.38
  avgRequestsPerSecond : 0.008789442
 
  the stats (copying fieldValueCache as well here, to show term
 statistics):
 
  name: fieldValueCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=1, initialSize=10,
  minSize=9000, acceptableSize=9500, cleanupThread=false)
  stats: lookups : 79
  hits : 77
  hitratio : 0.97
  inserts : 1
  evictions : 0
  size : 1
  warmupTime : 0
  cumulative_lookups : 79
  cumulative_hits : 77
  cumulative_hitratio : 0.97
  cumulative_inserts : 1
  cumulative_evictions : 0
  item_shingleContent_trigram :
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
   name: filterCache
  class: org.apache.solr.search.FastLRUCache
  version: 1.0
  description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
  minSize=138240, acceptableSize=145920, cleanupThread=false)
  stats: lookups : 1082854
  hits : 940370
  hitratio : 0.86
  inserts : 142486
  evictions : 0
  size : 142486
  warmupTime : 0
  cumulative_lookups : 1082854
  cumulative_hits : 940370
  cumulative_hitratio : 0.86
  cumulative_inserts : 142486
  cumulative_evictions : 0
 
 
  index size: 3,25 GB
 
  Does anyone have some pointers to where to look at and optimize for query
  time?
 
 
  2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Hi Dimitry, cache information is exposed via JMX, so you should be able
 to
  monitor that information with any JMX tool. See
  http://wiki.apache.org/solr/SolrJmx
 
  On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
 wrote:
 
   Yes, we do require that much.
   Ok, thanks, I will try increasing the maxsize.
  
   On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
  wrote:
  
facet.limit=50
your facet.limit seems too high. Do you actually require this much?
   
Since there a lot of evictions from filtercache, so, increase the
  maxsize
value to your acceptable limit.
   
Regards
Pravesh
   
--
View this message in context:
   
  
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
Sent from the Solr - User mailing list archive at Nabble.com.
   
  
  
  
   --
   Regards,
  
   Dmitry Kan
  
 
 
 
 
  --
  Regards,
 
  Dmitry Kan
 



 --
 Regards,

 Dmitry Kan



Re: cache monitoring tools?

2011-12-07 Thread Dmitry Kan
Otis, Tomás: thanks for the great links!

2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com

 Hi Dimitry, I pointed to the wiki page to enable JMX, then you can use any
 tool that visualizes JMX stuff like Zabbix. See

 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/

 On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com wrote:

  The culprit seems to be the merger (frontend) SOLR. Talking to one shard
  directly takes substantially less time (1-2 sec).
 
  On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com wrote:
 
   Tomás: thanks. The page you gave didn't mention cache specifically, is
   there more documentation on this specifically? I have used solrmeter
  tool,
   it draws the cache diagrams, is there a similar tool, but which would
 use
   jmx directly and present the cache usage in runtime?
  
   pravesh:
   I have increased the size of filterCache, but the search hasn't become
  any
   faster, taking almost 9 sec on avg :(
  
   name: search
   class: org.apache.solr.handler.component.SearchHandler
   version: $Revision: 1052938 $
   description: Search using components:
  
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
  
   stats: handlerStart : 1323255147351
   requests : 100
   errors : 3
   timeouts : 0
   totalTime : 885438
   avgTimePerRequest : 8854.38
   avgRequestsPerSecond : 0.008789442
  
   the stats (copying fieldValueCache as well here, to show term
  statistics):
  
   name: fieldValueCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=1, initialSize=10,
   minSize=9000, acceptableSize=9500, cleanupThread=false)
   stats: lookups : 79
   hits : 77
   hitratio : 0.97
   inserts : 1
   evictions : 0
   size : 1
   warmupTime : 0
   cumulative_lookups : 79
   cumulative_hits : 77
   cumulative_hitratio : 0.97
   cumulative_inserts : 1
   cumulative_evictions : 0
   item_shingleContent_trigram :
  
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
name: filterCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=153600, initialSize=4096,
   minSize=138240, acceptableSize=145920, cleanupThread=false)
   stats: lookups : 1082854
   hits : 940370
   hitratio : 0.86
   inserts : 142486
   evictions : 0
   size : 142486
   warmupTime : 0
   cumulative_lookups : 1082854
   cumulative_hits : 940370
   cumulative_hitratio : 0.86
   cumulative_inserts : 142486
   cumulative_evictions : 0
  
  
   index size: 3,25 GB
  
   Does anyone have some pointers to where to look at and optimize for
 query
   time?
  
  
   2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
  
   Hi Dimitry, cache information is exposed via JMX, so you should be
 able
  to
   monitor that information with any JMX tool. See
   http://wiki.apache.org/solr/SolrJmx
  
   On Wed, Dec 7, 2011 at 6:19 AM, Dmitry Kan dmitry@gmail.com
  wrote:
  
Yes, we do require that much.
Ok, thanks, I will try increasing the maxsize.
   
On Wed, Dec 7, 2011 at 10:56 AM, pravesh suyalprav...@yahoo.com
   wrote:
   
 facet.limit=50
 your facet.limit seems too high. Do you actually require this
 much?

 Since there a lot of evictions from filtercache, so, increase the
   maxsize
 value to your acceptable limit.

 Regards
 Pravesh

 --
 View this message in context:

   
  
 
 http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
 Sent from the Solr - User mailing list archive at Nabble.com.

   
   
   
--
Regards,
   
Dmitry Kan
   
  
  
  
  
   --
   Regards,
  
   Dmitry Kan
  
 
 
 
  --
  Regards,
 
  Dmitry Kan
 




-- 
Regards,

Dmitry Kan


cache monitoring tools?

2011-12-06 Thread Dmitry Kan
Hello list,

We've noticed quite huge strain on the filterCache in facet queries against
trigram fields (see schema in the end of this e-mail). The typical query
contains some keywords in the q parameter and boolean filter query on other
solr fields. It is also facet query, the facet field is of
type shingle_text_trigram (see schema) and facet.limit=50.


Questions: are there some tools (except for solrmeter) and/or approaches to
monitor / profile the load on caches, which would help to derive better
tuning parameters?

Can you recommend checking config parameters of other components but caches?

BTW, this has become much faster compared to solr 1.4 where we had to a lot
of optimizations on schema level (e.g. by making a number of stored fields
non-stored)

Here are the relevant stats from admin (SOLR 3.4):

description: Concurrent LRU Cache(maxSize=1, initialSize=10,
minSize=9000, acceptableSize=9500, cleanupThread=false)
stats: lookups : 93
hits : 90
hitratio : 0.96
inserts : 1
evictions : 0
size : 1
warmupTime : 0
cumulative_lookups : 93
cumulative_hits : 90
cumulative_hitratio : 0.96
cumulative_inserts : 1
cumulative_evictions : 0
item_shingleContent_trigram :
{field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=222924,phase1=221106,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=91}
name: filterCache
class: org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=512, initialSize=512,
minSize=460, acceptableSize=486, cleanupThread=false)
stats: lookups : 1003486
hits : 2809
hitratio : 0.00
inserts : 1000694
evictions : 1000221
size : 473
warmupTime : 0
cumulative_lookups : 1003486
cumulative_hits : 2809
cumulative_hitratio : 0.00
cumulative_inserts : 1000694
cumulative_evictions : 1000221


schema excerpt:

fieldType name=shingle_text_trigram class=solr.TextField
positionIncrementGap=100
   analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.ShingleFilterFactory maxShingleSize=3
outputUnigrams=true/
   /analyzer
/fieldType

-- 
Regards,

Dmitry Kan


Tools?

2011-05-25 Thread Sujatha Arun
Hello,

Are there any tools that can be used for analyzing the solr logs?

Regards
Sujatha


Re: velocity tools in solr-contrib-velocity?

2011-01-29 Thread Gora Mohanty
On 1/29/11, Paul Libbrecht p...@hoplahup.net wrote:

 Hello list,

 can anyone tell me how I can plug the velocity tools into my solr?
[...]

Not sure what you mean by plugging in the tools. There is
http://wiki.apache.org/solr/VelocityResponseWriter , but I
imagine that you have already seen that.

Regards,
Gora


velocity tools in solr-contrib-velocity?

2011-01-28 Thread Paul Libbrecht

Hello list,

can anyone tell me how I can plug the velocity tools into my solr?
Do I understand correctly the following comment in the source:

// Velocity context tools - TODO: make these pluggable

that it's only hard-coded thus far?

thanks in advance

paul


Re: benchmarking tools

2009-10-28 Thread mike anderson
Great suggestion, I took a look and it seems pretty useful. As a follow up
question, did you do anything to disable Solr caching for certain tests?

-mike

On Tue, Oct 27, 2009 at 8:14 PM, Joshua Tuberville 
joshuatubervi...@eharmony.com wrote:

 Mike,

 For response times I would also look at java.net's Faban benchmarking
 framework.  We use it extensively for our acceptance tests and tuning
 excercises.

 Joshua

 On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote:

  I've been making modifications here and there to the Solr source
  code in
  hopes to optimize for my particular setup. My goal now is to
  establish a
  descent benchmark toolset so that I can evaluate the observed
  performance
  increase before deciding rolling out. So far I've investigated
  Jmeter and
  Lucid Gaze, but each seem to have pretty steep learning curves, so I
  thought
  I'd ping the group before I sink a good chunk of time into either.
  My ideal
  performance metrics aren't so much load testing, but rather response
  time
  testing for different query types across different Solr
  configurations.
 
  If anybody has some insight into this kind of project I'd love to
  get some
  feedback.
 
  Thanks in advance,
  Mike Anderson




benchmarking tools

2009-10-27 Thread Mike Anderson
I've been making modifications here and there to the Solr source code in
hopes to optimize for my particular setup. My goal now is to establish a
descent benchmark toolset so that I can evaluate the observed performance
increase before deciding rolling out. So far I've investigated Jmeter and
Lucid Gaze, but each seem to have pretty steep learning curves, so I thought
I'd ping the group before I sink a good chunk of time into either. My ideal
performance metrics aren't so much load testing, but rather response time
testing for different query types across different Solr configurations.

If anybody has some insight into this kind of project I'd love to get some
feedback.

Thanks in advance,
Mike Anderson


Re: benchmarking tools

2009-10-27 Thread Joshua Tuberville
Mike,

For response times I would also look at java.net's Faban benchmarking  
framework.  We use it extensively for our acceptance tests and tuning  
excercises.

Joshua

On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote:

 I've been making modifications here and there to the Solr source  
 code in
 hopes to optimize for my particular setup. My goal now is to  
 establish a
 descent benchmark toolset so that I can evaluate the observed  
 performance
 increase before deciding rolling out. So far I've investigated  
 Jmeter and
 Lucid Gaze, but each seem to have pretty steep learning curves, so I  
 thought
 I'd ping the group before I sink a good chunk of time into either.  
 My ideal
 performance metrics aren't so much load testing, but rather response  
 time
 testing for different query types across different Solr  
 configurations.

 If anybody has some insight into this kind of project I'd love to  
 get some
 feedback.

 Thanks in advance,
 Mike Anderson



Re: Tools for Managing Synonyms, Elevate, etc.

2009-02-02 Thread Vicky_Dev

Mark,

Use GUI (may be custom build one) to read files which are present on Solr
server. These files can be read using webservice/RMI call. 

Do all manipulation on synonyms.txt contents and then call webservice/RMI
call to save that information. After saving information , just call RELOAD.


Check
::http://wiki.apache.org/solr/CoreAdmin#head-3f125034c6a64611779442539812067b8b430930

 http://localhost:8983/solr/admin/cores?action=RELOADcore=core0 

Hope this helps

~Vikrant




Cohen, Mark - ISamp;T wrote:
 
 I'm considering building some tools for our internal non-technical staff
 to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt
 so software developers don't have to maintain them.  Before my team
 starts building these tools, has anyone done this before?  If so, are
 these tools available as open source?  
 
 Thanks,
 Mark Cohen
 
 

-- 
View this message in context: 
http://www.nabble.com/Tools-for-Managing-Synonyms%2C-Elevate%2C-etc.-tp21696372p21796832.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tools for Managing Synonyms, Elevate, etc.

2009-01-28 Thread Otis Gospodnetic
Mark,

I am not aware of anyone open-sourcing such tools.  But note that changing the 
files with a GUI is easy (editor + scp?).  What makes things more complicated 
is the need to make Solr reload those files and, in some cases, changes really 
require a full index rebuilding.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Cohen, Mark - IST mark.co...@mtvn.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, January 27, 2009 5:55:46 PM
 Subject: Tools for Managing Synonyms, Elevate, etc.
 
 I'm considering building some tools for our internal non-technical staff
 to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt
 so software developers don't have to maintain them.  Before my team
 starts building these tools, has anyone done this before?  If so, are
 these tools available as open source?  
 
 Thanks,
 Mark Cohen



Tools for Managing Synonyms, Elevate, etc.

2009-01-27 Thread Cohen, Mark - IST
I'm considering building some tools for our internal non-technical staff
to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt
so software developers don't have to maintain them.  Before my team
starts building these tools, has anyone done this before?  If so, are
these tools available as open source?  

Thanks,
Mark Cohen


Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Kay Kay

Thanks Toby.

Aliter:
Under contrib/javascript/build.xml - dist target - I removed the 
dependency on 'docs' , to circumvent the problem.


But may be - it would be great to get js.jar from the rhino library 
distributed ( if not for license contradictions) to circumvent this.



Toby Cole wrote:
I came across this too earlier, I just deleted the contrib/javascript 
directory.
Of course, if you need javascript library then you'll have to get it 
building.


Sorry, probably not that helpful. :)
Toby.

On 17 Dec 2008, at 17:03, Kay Kay wrote:


I downloaded the latest .tgz and ran

$ ant dist


docs:

  [mkdir] Created dir: 
/opt/src/apache-solr-nightly/contrib/javascript/dist/doc
   [java] Exception in thread main java.lang.NoClassDefFoundError: 
org/mozilla/javascript/tools/shell/Main

   [java] at JsRun.main(Unknown Source)
   [java] Caused by: java.lang.ClassNotFoundException: 
org.mozilla.javascript.tools.shell.Main

   [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
   [java] at java.security.AccessController.doPrivileged(Native 
Method)
   [java] at 
java.net.URLClassLoader.findClass(URLClassLoader.java:188)

   [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
   [java] at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

   [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
   [java] at 
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

   [java] ... 1 more

BUILD FAILED
/opt/src/apache-solr-nightly/common-build.xml:335: The following 
error occurred while executing this line:
/opt/src/apache-solr-nightly/common-build.xml:212: The following 
error occurred while executing this line:
/opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java 
returned: 1



and came across the above mentioned error.

The class seems to be from the rhino (mozilla js ) library. Is it 
supposed to be packaged by default / is there a license restriction 
that prevents from being so .




Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com






Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Matthew Runo

I'm using Java 6 and it's compiling for me.

I'm doing..

ant clean
ant dist

and it works just fine. Maybe try an 'ant clean'?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Dec 17, 2008, at 9:17 AM, Toby Cole wrote:

I came across this too earlier, I just deleted the contrib/ 
javascript directory.
Of course, if you need javascript library then you'll have to get it  
building.


Sorry, probably not that helpful. :)
Toby.

On 17 Dec 2008, at 17:03, Kay Kay wrote:


I downloaded the latest .tgz and ran

$ ant dist


docs:

 [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/ 
javascript/dist/doc
  [java] Exception in thread main java.lang.NoClassDefFoundError:  
org/mozilla/javascript/tools/shell/Main

  [java] at JsRun.main(Unknown Source)
  [java] Caused by: java.lang.ClassNotFoundException:  
org.mozilla.javascript.tools.shell.Main
  [java] at java.net.URLClassLoader$1.run(URLClassLoader.java: 
200)
  [java] at java.security.AccessController.doPrivileged(Native  
Method)
  [java] at  
java.net.URLClassLoader.findClass(URLClassLoader.java:188)

  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  [java] at sun.misc.Launcher 
$AppClassLoader.loadClass(Launcher.java:301)

  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
  [java] at  
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

  [java] ... 1 more

BUILD FAILED
/opt/src/apache-solr-nightly/common-build.xml:335: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/common-build.xml:212: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java  
returned: 1



and came across the above mentioned error.

The class seems to be from the rhino (mozilla js ) library. Is it  
supposed to be packaged by default / is there a license restriction  
that prevents from being so .




Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com





Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Kay Kay

I downloaded the latest .tgz and ran

$ ant dist


docs:

   [mkdir] Created dir: 
/opt/src/apache-solr-nightly/contrib/javascript/dist/doc
[java] Exception in thread main java.lang.NoClassDefFoundError: 
org/mozilla/javascript/tools/shell/Main

[java] at JsRun.main(Unknown Source)
[java] Caused by: java.lang.ClassNotFoundException: 
org.mozilla.javascript.tools.shell.Main

[java] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
[java] at java.security.AccessController.doPrivileged(Native 
Method)
[java] at 
java.net.URLClassLoader.findClass(URLClassLoader.java:188)

[java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
[java] at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

[java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
[java] at 
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

[java] ... 1 more

BUILD FAILED
/opt/src/apache-solr-nightly/common-build.xml:335: The following error 
occurred while executing this line:
/opt/src/apache-solr-nightly/common-build.xml:212: The following error 
occurred while executing this line:
/opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java 
returned: 1



and came across the above mentioned error.

The class seems to be from the rhino (mozilla js ) library. Is it 
supposed to be packaged by default / is there a license restriction that 
prevents from being so .




Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Toby Cole
I came across this too earlier, I just deleted the contrib/javascript  
directory.
Of course, if you need javascript library then you'll have to get it  
building.


Sorry, probably not that helpful. :)
Toby.

On 17 Dec 2008, at 17:03, Kay Kay wrote:


I downloaded the latest .tgz and ran

$ ant dist


docs:

  [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/ 
javascript/dist/doc
   [java] Exception in thread main java.lang.NoClassDefFoundError:  
org/mozilla/javascript/tools/shell/Main

   [java] at JsRun.main(Unknown Source)
   [java] Caused by: java.lang.ClassNotFoundException:  
org.mozilla.javascript.tools.shell.Main
   [java] at java.net.URLClassLoader$1.run(URLClassLoader.java: 
200)
   [java] at java.security.AccessController.doPrivileged(Native  
Method)
   [java] at  
java.net.URLClassLoader.findClass(URLClassLoader.java:188)

   [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
   [java] at sun.misc.Launcher 
$AppClassLoader.loadClass(Launcher.java:301)

   [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
   [java] at  
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

   [java] ... 1 more

BUILD FAILED
/opt/src/apache-solr-nightly/common-build.xml:335: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/common-build.xml:212: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java  
returned: 1



and came across the above mentioned error.

The class seems to be from the rhino (mozilla js ) library. Is it  
supposed to be packaged by default / is there a license restriction  
that prevents from being so .




Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com



Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Shalin Shekhar Mangar
On Wed, Dec 17, 2008 at 10:53 PM, Matthew Runo mr...@zappos.com wrote:

 I'm using Java 6 and it's compiling for me.


I believe rhino is included by default in Java 6

-- 
Regards,
Shalin Shekhar Mangar.


Re: Diagnostic tools

2008-08-05 Thread Yonik Seeley
On Tue, Aug 5, 2008 at 12:43 PM, Kashyap, Raghu
[EMAIL PROTECTED] wrote:
 Are there are tools that are available to view the indexing process? We
 have a cron process which posts XML files to the solr index server.
 However, we are NOT seeing the documents posted correctly and we are
 also NOT getting any errors from the client

You need to send a commit before index changes become visible.

-Yonik


RE: Diagnostic tools

2008-08-05 Thread Kashyap, Raghu
Yes we are sending the commits.

-Raghu

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, August 05, 2008 12:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Diagnostic tools

On Tue, Aug 5, 2008 at 12:43 PM, Kashyap, Raghu
[EMAIL PROTECTED] wrote:
 Are there are tools that are available to view the indexing process?
We
 have a cron process which posts XML files to the solr index server.
 However, we are NOT seeing the documents posted correctly and we are
 also NOT getting any errors from the client

You need to send a commit before index changes become visible.

-Yonik


Re: Diagnostic tools

2008-08-05 Thread Norberto Meijome
On Tue, 5 Aug 2008 11:43:44 -0500
Kashyap, Raghu [EMAIL PROTECTED] wrote:

 Hi,

Hi Kashyap,
please don't hijack topic threads.

http://en.wikipedia.org/wiki/Thread_hijacking

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Software QA is like cleaning my cat's litter box: Sift out the big chunks. Stir 
in the rest. Hope it doesn't stink.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


RE: Benchmarking tools?

2008-06-30 Thread Nico Heid
Hi,
I did some trivial Tests with Jmeter.
I set up Jmeter to increase the number of threads steadily.
For requests I either usa a random word or combination of words in a
wordlist or some sample date from the test system. (this is described in the
JMeter manual)

In my case the System works fine as long as I don't exceed the max number of
requests per second it can handel. But thats not a big surprise. More
interesting seems the fact, that to a certain degree, after exceeding the
max nr of requests response time seems to rise linear for a little while and
then exponentially. But that might also be the result of my test szenario.

Nico


 -Original Message-
 From: Jacob Singh [mailto:[EMAIL PROTECTED]
 Sent: Sunday, June 29, 2008 6:04 PM
 To: solr-user@lucene.apache.org
 Subject: Benchmarking tools?

 Hi folks,

 Does anyone have any bright ideas on how to benchmark solr?
 Unless someone has something better, here is what I am thinking:

 1. Have a config file where one can specify info like how
 many docs, how large, how many facets, and how many updates /
 searches per minute

 2. Use one of the various client APIs to generate XML files
 for updates using some kind of lorem ipsum text as a base and
 store them in a dir.

 3. Use siege to set the update run at whatever interval is
 specified in the config, sending an update every x seconds
 and removing it from the directory

 4. Generate a list of search queries based upon the facets
 created, and build a urls.txt with all of these search urls

 5. Run the searches through siege

 6. Monitor the output using nagios to see where load kicks in.

 This is not that sophisticated, and feels like it won't
 really pinpoint bottlenecks, but would aproximately tell us
 where a server will start to bail.

 Does anyone have any better ideas?

 Best,
 Jacob Singh





Re: Benchmarking tools?

2008-06-30 Thread Jacob Singh
Hi Nico,

Thanks for the info. Do you have you scripts available for this?

Also, is it configurable to give variable numbers of facets and facet
based searches?  I have a feeling this will be the limiting factor, and
much slower than keyword searches but I could be (and usually am) wrong.

Best,

Jacob

Nico Heid wrote:
 Hi,
 I did some trivial Tests with Jmeter.
 I set up Jmeter to increase the number of threads steadily.
 For requests I either usa a random word or combination of words in a
 wordlist or some sample date from the test system. (this is described in the
 JMeter manual)
 
 In my case the System works fine as long as I don't exceed the max number of
 requests per second it can handel. But thats not a big surprise. More
 interesting seems the fact, that to a certain degree, after exceeding the
 max nr of requests response time seems to rise linear for a little while and
 then exponentially. But that might also be the result of my test szenario.
 
 Nico
 
 
 -Original Message-
 From: Jacob Singh [mailto:[EMAIL PROTECTED]
 Sent: Sunday, June 29, 2008 6:04 PM
 To: solr-user@lucene.apache.org
 Subject: Benchmarking tools?

 Hi folks,

 Does anyone have any bright ideas on how to benchmark solr?
 Unless someone has something better, here is what I am thinking:

 1. Have a config file where one can specify info like how
 many docs, how large, how many facets, and how many updates /
 searches per minute

 2. Use one of the various client APIs to generate XML files
 for updates using some kind of lorem ipsum text as a base and
 store them in a dir.

 3. Use siege to set the update run at whatever interval is
 specified in the config, sending an update every x seconds
 and removing it from the directory

 4. Generate a list of search queries based upon the facets
 created, and build a urls.txt with all of these search urls

 5. Run the searches through siege

 6. Monitor the output using nagios to see where load kicks in.

 This is not that sophisticated, and feels like it won't
 really pinpoint bottlenecks, but would aproximately tell us
 where a server will start to bail.

 Does anyone have any better ideas?

 Best,
 Jacob Singh

 
 



Re: Benchmarking tools?

2008-06-30 Thread Nico Heid

Hi,
I basically followed this:
http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae

I basically put all my queries in a flat text file. you could either use 
two parameters or put them in one file.
The good point of this is, that each test uses the same queries, so you 
can compare the settings better afterwards.


If you use varying facets, you might just go with 2 text files. If it 
stays the same in one test you can hardcode it into the test case.


I polished the result a little, if you want to take a look: 
http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such 
nice graphs.
(green is the max results delivered, upon 66 active users per second 
the response time increases (orange/yellow, average and median of the 
response times)
(i know the scales and descriptions are missing :-) but you should get 
the picture)
I manually reduced the machines capacity, elsewise solr would server 
more than 12000 requests per second. (the whole index did fit into ram)

I can send you my saved test case if this would help you.

Nico


Jacob Singh wrote:

Hi Nico,

Thanks for the info. Do you have you scripts available for this?

Also, is it configurable to give variable numbers of facets and facet
based searches?  I have a feeling this will be the limiting factor, and
much slower than keyword searches but I could be (and usually am) wrong.

Best,

Jacob

Nico Heid wrote:
  

Hi,
I did some trivial Tests with Jmeter.
I set up Jmeter to increase the number of threads steadily.
For requests I either usa a random word or combination of words in a
wordlist or some sample date from the test system. (this is described in the
JMeter manual)

In my case the System works fine as long as I don't exceed the max number of
requests per second it can handel. But thats not a big surprise. More
interesting seems the fact, that to a certain degree, after exceeding the
max nr of requests response time seems to rise linear for a little while and
then exponentially. But that might also be the result of my test szenario.

Nico




-Original Message-
From: Jacob Singh [mailto:[EMAIL PROTECTED]
Sent: Sunday, June 29, 2008 6:04 PM
To: solr-user@lucene.apache.org
Subject: Benchmarking tools?

Hi folks,

Does anyone have any bright ideas on how to benchmark solr?
Unless someone has something better, here is what I am thinking:

1. Have a config file where one can specify info like how
many docs, how large, how many facets, and how many updates /
searches per minute

2. Use one of the various client APIs to generate XML files
for updates using some kind of lorem ipsum text as a base and
store them in a dir.

3. Use siege to set the update run at whatever interval is
specified in the config, sending an update every x seconds
and removing it from the directory

4. Generate a list of search queries based upon the facets
created, and build a urls.txt with all of these search urls

5. Run the searches through siege

6. Monitor the output using nagios to see where load kicks in.

This is not that sophisticated, and feels like it won't
really pinpoint bottlenecks, but would aproximately tell us
where a server will start to bail.

Does anyone have any better ideas?

Best,
Jacob Singh

  





Re: Benchmarking tools?

2008-06-30 Thread Jacob Singh
nice stuff. Please send me the test case, I'd love to see it.

Thanks,
Jacob
Nico Heid wrote:
 Hi,
 I basically followed this:
 http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae
 
 
 I basically put all my queries in a flat text file. you could either use
 two parameters or put them in one file.
 The good point of this is, that each test uses the same queries, so you
 can compare the settings better afterwards.
 
 If you use varying facets, you might just go with 2 text files. If it
 stays the same in one test you can hardcode it into the test case.
 
 I polished the result a little, if you want to take a look:
 http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such
 nice graphs.
 (green is the max results delivered, upon 66 active users per second
 the response time increases (orange/yellow, average and median of the
 response times)
 (i know the scales and descriptions are missing :-) but you should get
 the picture)
 I manually reduced the machines capacity, elsewise solr would server
 more than 12000 requests per second. (the whole index did fit into ram)
 I can send you my saved test case if this would help you.
 
 Nico
 
 
 Jacob Singh wrote:
 Hi Nico,

 Thanks for the info. Do you have you scripts available for this?

 Also, is it configurable to give variable numbers of facets and facet
 based searches?  I have a feeling this will be the limiting factor, and
 much slower than keyword searches but I could be (and usually am) wrong.

 Best,

 Jacob

 Nico Heid wrote:
  
 Hi,
 I did some trivial Tests with Jmeter.
 I set up Jmeter to increase the number of threads steadily.
 For requests I either usa a random word or combination of words in a
 wordlist or some sample date from the test system. (this is described
 in the
 JMeter manual)

 In my case the System works fine as long as I don't exceed the max
 number of
 requests per second it can handel. But thats not a big surprise. More
 interesting seems the fact, that to a certain degree, after exceeding
 the
 max nr of requests response time seems to rise linear for a little
 while and
 then exponentially. But that might also be the result of my test
 szenario.

 Nico



 -Original Message-
 From: Jacob Singh [mailto:[EMAIL PROTECTED]
 Sent: Sunday, June 29, 2008 6:04 PM
 To: solr-user@lucene.apache.org
 Subject: Benchmarking tools?

 Hi folks,

 Does anyone have any bright ideas on how to benchmark solr?
 Unless someone has something better, here is what I am thinking:

 1. Have a config file where one can specify info like how
 many docs, how large, how many facets, and how many updates /
 searches per minute

 2. Use one of the various client APIs to generate XML files
 for updates using some kind of lorem ipsum text as a base and
 store them in a dir.

 3. Use siege to set the update run at whatever interval is
 specified in the config, sending an update every x seconds
 and removing it from the directory

 4. Generate a list of search queries based upon the facets
 created, and build a urls.txt with all of these search urls

 5. Run the searches through siege

 6. Monitor the output using nagios to see where load kicks in.

 This is not that sophisticated, and feels like it won't
 really pinpoint bottlenecks, but would aproximately tell us
 where a server will start to bail.

 Does anyone have any better ideas?

 Best,
 Jacob Singh

   
 
 



Re: Benchmarking tools?

2008-06-30 Thread Yugang Hu

Me too. Thanks.

Jacob Singh wrote:

nice stuff. Please send me the test case, I'd love to see it.

Thanks,
Jacob
Nico Heid wrote:
  

Hi,
I basically followed this:
http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae


I basically put all my queries in a flat text file. you could either use
two parameters or put them in one file.
The good point of this is, that each test uses the same queries, so you
can compare the settings better afterwards.

If you use varying facets, you might just go with 2 text files. If it
stays the same in one test you can hardcode it into the test case.

I polished the result a little, if you want to take a look:
http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such
nice graphs.
(green is the max results delivered, upon 66 active users per second
the response time increases (orange/yellow, average and median of the
response times)
(i know the scales and descriptions are missing :-) but you should get
the picture)
I manually reduced the machines capacity, elsewise solr would server
more than 12000 requests per second. (the whole index did fit into ram)
I can send you my saved test case if this would help you.

Nico


Jacob Singh wrote:


Hi Nico,

Thanks for the info. Do you have you scripts available for this?

Also, is it configurable to give variable numbers of facets and facet
based searches?  I have a feeling this will be the limiting factor, and
much slower than keyword searches but I could be (and usually am) wrong.

Best,

Jacob

Nico Heid wrote:
 
  

Hi,
I did some trivial Tests with Jmeter.
I set up Jmeter to increase the number of threads steadily.
For requests I either usa a random word or combination of words in a
wordlist or some sample date from the test system. (this is described
in the
JMeter manual)

In my case the System works fine as long as I don't exceed the max
number of
requests per second it can handel. But thats not a big surprise. More
interesting seems the fact, that to a certain degree, after exceeding
the
max nr of requests response time seems to rise linear for a little
while and
then exponentially. But that might also be the result of my test
szenario.

Nico


   


-Original Message-
From: Jacob Singh [mailto:[EMAIL PROTECTED]
Sent: Sunday, June 29, 2008 6:04 PM
To: solr-user@lucene.apache.org
Subject: Benchmarking tools?

Hi folks,

Does anyone have any bright ideas on how to benchmark solr?
Unless someone has something better, here is what I am thinking:

1. Have a config file where one can specify info like how
many docs, how large, how many facets, and how many updates /
searches per minute

2. Use one of the various client APIs to generate XML files
for updates using some kind of lorem ipsum text as a base and
store them in a dir.

3. Use siege to set the update run at whatever interval is
specified in the config, sending an update every x seconds
and removing it from the directory

4. Generate a list of search queries based upon the facets
created, and build a urls.txt with all of these search urls

5. Run the searches through siege

6. Monitor the output using nagios to see where load kicks in.

This is not that sophisticated, and feels like it won't
really pinpoint bottlenecks, but would aproximately tell us
where a server will start to bail.

Does anyone have any better ideas?

Best,
Jacob Singh

  
  




  




Re: Fw: Download solr-tools rpm

2007-03-29 Thread Suresh Kannan

Thanks Hoss,

I found them in SRC/ SCRIPTS. But i dont know how to execute those 
snapshooter, snappuller, abc, backup... How I can make one instance of solr 
as master and other as slave. Is it fully depends of rsyncd


-Suresh

- Original Message - 
From: Chris Hostetter [EMAIL PROTECTED]

To: solr-user@lucene.apache.org
Sent: Thursday, March 29, 2007 4:04 AM
Subject: Re: Fw: Download solr-tools rpm




: I need to configure master / slave servers. Hence i check at wiki help
: documents. I found that i need to install solr-tools rpm. But i could
: not able to download the files. Please some help me with solr-tools rpm.

Any refrences to a solr-tools rpm on the wiki are outdated and leftover
from when i ported those wiki pages from CNET ... Apache Solr doesn't
distribute anything as an RPM, you should be abl to find all of those
scripts in the Solr release tgz bundles.

-Hoss






Re: Fw: Download solr-tools rpm

2007-03-29 Thread Chris Hostetter

: I found them in SRC/ SCRIPTS. But i dont know how to execute those
: snapshooter, snappuller, abc, backup... How I can make one instance of solr
: as master and other as slave. Is it fully depends of rsyncd

rsync is in fact at the heart of the replication ... that's really all
those scripts are is some hardlinking followed by some rsyncing.

How to use them (suggested crontab configuration, etc...) is documented
fairly completely on the wiki...

http://wiki.apache.org/solr/CollectionDistribution


:
: -Suresh
:
: - Original Message -
: From: Chris Hostetter [EMAIL PROTECTED]
: To: solr-user@lucene.apache.org
: Sent: Thursday, March 29, 2007 4:04 AM
: Subject: Re: Fw: Download solr-tools rpm
:
:
: 
:  : I need to configure master / slave servers. Hence i check at wiki help
:  : documents. I found that i need to install solr-tools rpm. But i could
:  : not able to download the files. Please some help me with solr-tools rpm.
: 
:  Any refrences to a solr-tools rpm on the wiki are outdated and leftover
:  from when i ported those wiki pages from CNET ... Apache Solr doesn't
:  distribute anything as an RPM, you should be abl to find all of those
:  scripts in the Solr release tgz bundles.
: 
:  -Hoss
: 
: 
:



-Hoss



Re: Fw: Download solr-tools rpm

2007-03-28 Thread Chris Hostetter

: I need to configure master / slave servers. Hence i check at wiki help
: documents. I found that i need to install solr-tools rpm. But i could
: not able to download the files. Please some help me with solr-tools rpm.

Any refrences to a solr-tools rpm on the wiki are outdated and leftover
from when i ported those wiki pages from CNET ... Apache Solr doesn't
distribute anything as an RPM, you should be abl to find all of those
scripts in the Solr release tgz bundles.

-Hoss