Date range problems
Hi All. We're seeing a really interesting problem when searching by date range. We have two fields of type date in our index (they are both indexed and stored). They are: content_date and created_date We can run any date-range query we want against content_date and we get expected results. However, when we run similar queries against created_date we get 0 results from the query consistently. Now, here's the interesting part -- if we do a plain search without a date range, BUT sort by created_date desc we get properly sorted results. So, it seems like the index works for sorting but not for searching. Does that make any sense? Anyone have an ideas on how we can diagnose this issue? Here's the relevant block from our schema (before you ask): field name=id type=string indexed=true stored=true/ field name=content_date type=date indexed=true stored=true / field name=media_type type=string indexed=true stored=true multiValued=true / field name=location type=string indexed=true stored=true multiValued=true / field name=country_code type=string indexed=true stored=true multiValued=true / field name=text type=text indexed=true stored=false multiValued=true/ field name=content_source type=string indexed=true stored=true / field name=title type=string indexed=true stored=true / field name=site_id type=string indexed=true stored=false / field name=journalist_id type=string indexed=true stored=false / field name=network type=string indexed=true stored=false / field name=created_date type=date indexed=true stored=true / TIA, Dave W.
comment-out a filter?
Hi All. I want to comment-out a filter in my schema.xml, specifically the solr.EnglishPorterFilterFactory filter. I want to know -- will this cause me to have to re-build my index? Or will a restart of SOLR get the job done? Thanks! Dave W
RE: Facets and running out of Heap Space
It looks now like I can't use facets the way I was hoping to because the memory requirements are impractical. So, as an alternative I was thinking I could get counts by doing rows=0 and using filter queries. Is there a reason to think that this might perform better? Or, am I simply moving the problem to another step in the process? DW -Original Message- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 10:53 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_ Mike, how did you calculate that value? I'm trying to tune my caches, and any equations that could be used to determine some balanced settings would be extremely helpful. I'm in a memory limited environment, so I can't afford to throw a ton of cache at the problem. (I don't want to thread-jack, but I'm also wondering whether anyone has any notes on how to tune cache sizes for the filterCache, queryResultCache and documentCache). Thanks, Stu -Original Message- From: Mike Klaas [EMAIL PROTECTED] Sent: Tuesday, October 9, 2007 9:30pm To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 9-Oct-07, at 12:36 PM, David Whalen wrote: (snip) I'm sure we could stop storing many of these columns, especially if someone told me that would make a big difference. I don't think that it would make a difference in memory consumption, but storage is certainly not necessary for faceting. Extra stored fields can slow down search if they are large (in terms of bytes), but don't really occupy extra memory, unless they are polluting the doc cache. Does 'text' need to be stored? what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? I could probably estimate that myself on a per-column basis. it ranges from 4 distinct values for media_type to 30-ish for location to 200-ish for country_code to almost 10,000 for site_id to almost 100,000 for journalist_id. Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_, so it should be a net win for those (although quite close in space requirements for a 30-ary field on your index size). -Mike
RE: Facets and running out of Heap Space
Accoriding to Yonik I can't use minDf because I'm faceting on a string field. I'm thinking of changing it to a tokenized type so that I can utilize this setting, but then I'll have to rebuild my entire index. Unless there's some way around that? -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 10, 2007 4:56 PM To: solr-user@lucene.apache.org Cc: stuhood Subject: Re: Facets and running out of Heap Space On 10-Oct-07, at 12:19 PM, David Whalen wrote: It looks now like I can't use facets the way I was hoping to because the memory requirements are impractical. I can't remember if this has been mentioned, but upping the HashDocSet size is one way to reduce memory consumption. Whether this will work well depends greatly on the cardinality of your facet sets. facet.enum.cache.minDf set high is another option (will not generate a bitset for any value whose facet set is less that this value). Both options have performance implications. So, as an alternative I was thinking I could get counts by doing rows=0 and using filter queries. Is there a reason to think that this might perform better? Or, am I simply moving the problem to another step in the process? Running one query per unique facet value seems impractical, if that is what you are suggesting. Setting minDf to a very high value should always outperform such an approach. -Mike DW -Original Message- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 10:53 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_ Mike, how did you calculate that value? I'm trying to tune my caches, and any equations that could be used to determine some balanced settings would be extremely helpful. I'm in a memory limited environment, so I can't afford to throw a ton of cache at the problem. (I don't want to thread-jack, but I'm also wondering whether anyone has any notes on how to tune cache sizes for the filterCache, queryResultCache and documentCache). Thanks, Stu -Original Message- From: Mike Klaas [EMAIL PROTECTED] Sent: Tuesday, October 9, 2007 9:30pm To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 9-Oct-07, at 12:36 PM, David Whalen wrote: (snip) I'm sure we could stop storing many of these columns, especially if someone told me that would make a big difference. I don't think that it would make a difference in memory consumption, but storage is certainly not necessary for faceting. Extra stored fields can slow down search if they are large (in terms of bytes), but don't really occupy extra memory, unless they are polluting the doc cache. Does 'text' need to be stored? what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? I could probably estimate that myself on a per-column basis. it ranges from 4 distinct values for media_type to 30-ish for location to 200-ish for country_code to almost 10,000 for site_id to almost 100,000 for journalist_id. Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_, so it should be a net win for those (although quite close in space requirements for a 30-ary field on your index size). -Mike
RE: Facets and running out of Heap Space
I'll see what I can do about that. Truthfully, the most important facet we need is the one on media_type, which has only 4 unique values. The second most important one to us is location, which has about 30 unique values. So, it would seem like we actually need a counter-intuitive solution. That's why I thought Field Queries might be the solution. Is there some reason to avoid setting multiValued to true here? It sounds like it would be the true cure-all Thanks again! dave -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 10, 2007 6:20 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 10-Oct-07, at 2:40 PM, David Whalen wrote: Accoriding to Yonik I can't use minDf because I'm faceting on a string field. I'm thinking of changing it to a tokenized type so that I can utilize this setting, but then I'll have to rebuild my entire index. Unless there's some way around that? For the fields that matter (many unique values), this is likely result in a performance regression. It might be better to try storing less unique data. For instance, faceting on the blog_url field, or create_date in your schema would case problems (they probably have millions of unique values). It would be helpful to know which field is causing the problem. One way would be to do a sorted query on a quiescent index for each field, and see if there are any suspiciously large jumps in memory usage. -Mike -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 10, 2007 4:56 PM To: solr-user@lucene.apache.org Cc: stuhood Subject: Re: Facets and running out of Heap Space On 10-Oct-07, at 12:19 PM, David Whalen wrote: It looks now like I can't use facets the way I was hoping to because the memory requirements are impractical. I can't remember if this has been mentioned, but upping the HashDocSet size is one way to reduce memory consumption. Whether this will work well depends greatly on the cardinality of your facet sets. facet.enum.cache.minDf set high is another option (will not generate a bitset for any value whose facet set is less that this value). Both options have performance implications. So, as an alternative I was thinking I could get counts by doing rows=0 and using filter queries. Is there a reason to think that this might perform better? Or, am I simply moving the problem to another step in the process? Running one query per unique facet value seems impractical, if that is what you are suggesting. Setting minDf to a very high value should always outperform such an approach. -Mike DW -Original Message- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 10:53 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_ Mike, how did you calculate that value? I'm trying to tune my caches, and any equations that could be used to determine some balanced settings would be extremely helpful. I'm in a memory limited environment, so I can't afford to throw a ton of cache at the problem. (I don't want to thread-jack, but I'm also wondering whether anyone has any notes on how to tune cache sizes for the filterCache, queryResultCache and documentCache). Thanks, Stu -Original Message- From: Mike Klaas [EMAIL PROTECTED] Sent: Tuesday, October 9, 2007 9:30pm To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 9-Oct-07, at 12:36 PM, David Whalen wrote: (snip) I'm sure we could stop storing many of these columns, especially if someone told me that would make a big difference. I don't think that it would make a difference in memory consumption, but storage is certainly not necessary for faceting. Extra stored fields can slow down search if they are large (in terms of bytes), but don't really occupy extra memory, unless they are polluting the doc cache. Does 'text' need to be stored? what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? I could probably estimate that myself on a per-column basis. it ranges from 4 distinct values for media_type to 30-ish for location to 200-ish for country_code to almost 10,000 for site_id to almost 100,000 for journalist_id. Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_, so it should be a net win for those (although quite close in space requirements for a 30-ary field on your index size). -Mike
RE: Availability Issues
Chris: We're using Jetty also, so I get the sense I'm looking at the wrong log file. On that note -- I've read that Jetty isn't the best servlet container to use in these situations, is that your experience? Dave -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 11:20 PM To: solr-user Subject: RE: Availability Issues : My logs don't look anything like that. They look like HTTP : requests. Am I looking in the wrong place? what servlet container are you using? every servlet container handles applications logs differently -- it's especially tricky becuse even the format can be changed, the examples i gave before are in the default format you get if you use the jetty setup in the solr example (which logs to stdout), but many servlet containers won't include that much detail by default (they typically leave out the classname and method name). there's also typically a setting that controls the verbosity -- so in some configurations only the SEVERE messages are logged and in others the INFO messages are logged ... you're going to want at least the INFO level to debug stuff. grep all the log files you can find for Solr home set to ... that's one of the first messages Solr logs. if you can find that, you'll find the other messages i was talking about. -Hoss
RE: Availability Issues
All: How can I break up my install onto more than one box? We've hit a learning curve here and we don't understand how best to proceed. Right now we have everything crammed onto one box because we don't know any better. So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have faceted queries Again, real-world experience is preferred here over book knowledge. We've tried to read the docs and it's only made us more confused. TIA Dave W -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Oh, so you are using the same boxes for updating and querying? When you insert, are you using multiple threads? If so, how many? What is the full URL of those slow query requests? Do the slow requests start after a commit? Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. -Yonik
Facets and running out of Heap Space
Hi All. I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the errors would happen again. We've increased the initial heap size to 2.5GB and it's still happening. Is there anything we can do about this? Thanks in advance, Dave W
RE: Facets and running out of Heap Space
Hi Yonik. According to the doc: This is only used during the term enumeration method of faceting (facet.field type faceting on multi-valued or full-text fields). What if I'm faceting on just a plain String field? It's not full-text, and I don't have multiValued set for it Dave -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 12:47 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 10/9/07, David Whalen [EMAIL PROTECTED] wrote: I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the errors would happen again. We've increased the initial heap size to 2.5GB and it's still happening. Is there anything we can do about this? Try facet.enum.cache.minDf param: http://wiki.apache.org/solr/SimpleFacetParameters -Yonik
RE: Facets and running out of Heap Space
Make sure you have: requestHandler name=/admin/luke class=org.apache.solr.handler.admin.LukeRequestHandler / defined in solrconfig.xml What's the consequence of me changing the solrconfig.xml file? Doesn't that cause a restart of solr? for a large index, this can be very slow but the results are valuable. In what way? I'm still not clear on what this does for me -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 4:01 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? check: http://wiki.apache.org/solr/LukeRequestHandler Make sure you have: requestHandler name=/admin/luke class=org.apache.solr.handler.admin.LukeRequestHandler / defined in solrconfig.xml for a large index, this can be very slow but the results are valuable. ryan
Availability Issues
Hi All. I'm seeing all these threads about availability and I'm wondering why my situation is so different than others'. We're running SOLR 1.2 with a 2.5G heap size. On any given day, the system becomes completely unresponsive. We can't even get /solr/admin/ to come up, much less any select queries. The only thing we can do is kill the SOLR process and re-start it. We are indexing over 25 million documents and we add about as much as we remove daily, so the number remains fairly constant. Again, it seems like other folks are having a much easier time with SOLR than we are. Can anyone help by sharing how you've got it configured? Does anyone have a similar experience? TIA. DW
RE: Availability Issues
Hi Tom. The logs show nothing but regular activity. We do a tail -f on the logfile and we can read it during the unresponsive period and we don't see any errors. I've attached our schema/config files. They are pretty much out-of-the-box values, except for our index. Dave -Original Message- From: Tom Hill [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 2:22 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues Hi - We're definitely not seeing that. What do your logs show? What do your schema/solrconfig look like? Tom On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Hi All. I'm seeing all these threads about availability and I'm wondering why my situation is so different than others'. We're running SOLR 1.2 with a 2.5G heap size. On any given day, the system becomes completely unresponsive. We can't even get /solr/admin/ to come up, much less any select queries. The only thing we can do is kill the SOLR process and re-start it. We are indexing over 25 million documents and we add about as much as we remove daily, so the number remains fairly constant. Again, it seems like other folks are having a much easier time with SOLR than we are. Can anyone help by sharing how you've got it configured? Does anyone have a similar experience? TIA. DW
RE: Availability Issues
Hi Yonik. What version of Solr are you running? We're running: Solr Specification Version: 1.2.2007.08.24.08.06.00 Solr Implementation Version: nightly ${svnversion} - yonik - 2007-08-24 08:06:00 Lucene Specification Version: 2.2.0 Lucene Implementation Version: 2.2.0 548010 - buschmi - 2007-06-16 23:15:56 Is the CPU pegged at 100% when it's unresponsive? It's a little difficult to be sure. We have a HT box and the CPU % we get back is misleading. I think it's safe to say we may spike up to 100% but we don't necessarily stay pegged there. Have you taken a thread dump to see what is going on? We can't do it b/c during the unresponsive time we can't access the admin site (/solr/admin) at all. I don't know how to do a thread dump via the command line Do you get into a situation where more than one searcher is warming at a time? (there is configuration that can prevent this one from happening). Forgive me when I say I'm not totally clear on what this question means. The index is constantly getting hit with a myriad or queries, if that's what you meant Thanks, Dave -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 2:23 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: We're running SOLR 1.2 with a 2.5G heap size. On any given day, the system becomes completely unresponsive. We can't even get /solr/admin/ to come up, much less any select queries. What version of Solr are you running? The first step to diagnose something like this is to figure out what is going on... Is the CPU pegged at 100% when it's unresponsive? Have you taken a thread dump to see what is going on? Do you get into a situation where more than one searcher is warming at a time? (there is configuration that can prevent this one from happening). -Yonik
RE: Availability Issues
Hi Yonik. Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Dave -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:01 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: The logs show nothing but regular activity. We do a tail -f on the logfile and we can read it during the unresponsive period and we don't see any errors. You don't see log entries for requests until after they complete. When a server becomes unresponsive, try shutting off further traffic to it, and let it finish whatever requests it's working on (assuming that's the issue) so you can see them in the log. Do you see any requests that took a really long time to finish? -Yonik
RE: Availability Issues
Oh, so you are using the same boxes for updating and querying? Yep. We have a MySQL database on the box and we query it and POST directly into SOLR via wget in PERL. We then also hit the box for queries. [We'd be very interested in hearing about best practices on how to seperate-out the data from the index and how to balance them when the inserts outweigh the selects by factors of 50,000:1] When you insert, are you using multiple threads? If so, how many? We're not threading at all. We have a PERL script that does a select statement out of a MySQL database and runs POSTs sequentially into SOLR, one per document. After a batch of 10,000 POSTs, we run a background commit (using waitFlush and waitSearcher) Again, I'd be very grateful for success stories from people in terms of good server architecture. We are ready and willing to change versions of linux, of the Java container, etc. And we're ready to add more boxes if that'll help. We just need some guidance. What is the full URL of those slow query requests? They can be anything. For example: [08/10/2007:18:51:55 +] GET /solr/select/?q=solrversion=2.2start=0rows=10indent=on HTTP/1.1 200 45799 Do the slow requests start after a commit? Based on the way the logs read, you could argue that point. The stream of POSTs end in the logs and then subsequent queries take longer to run, but it's hard to be sure there's a direct correlation. Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. Next time it happens I'll shoot it over. --Dave -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Oh, so you are using the same boxes for updating and querying? When you insert, are you using multiple threads? If so, how many? What is the full URL of those slow query requests? Do the slow requests start after a commit? Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. -Yonik
RE: Availability Issues
Hi Chris. My logs don't look anything like that. They look like HTTP requests. Am I looking in the wrong place? Dave -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 5:02 PM To: solr-user Subject: RE: Availability Issues : Do the slow requests start after a commit? : : Based on the way the logs read, you could argue that point. : The stream of POSTs end in the logs and then subsequent queries : take longer to run, but it's hard to be sure there's a direct : correlation. you would know based on the INFO level messages related to a commit ... you'll see messages that look like this when the commit starts... Oct 8, 2007 1:56:48 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) ...then you'll see a message like this... Oct 8, 2007 1:56:48 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush ...if you have autowarming you'll see a bunch of logs about that, and then eventually you'll see a message like this... Oct 8, 2007 1:56:48 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 299 ...the important question is how many of these hangs or really long queries happen in the midst of all that ... how many happen very quickly after it (which may indicate not enough warming) (NOTE: some of those log messages may look different in your nightly snapshot version, but the main gist should be the same .. i don't remember when exactly the LogUpdateProcessor was added). -Hoss
RE: Availability Issues
Thanks for letting me know that. Okay, here they are: BEGIN SCHEMA.XML === ?xml version=1.0 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is the Solr schema file. This file should be named schema.xml and should be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by default) or located where the classloader for the Solr webapp can find it. For more information, on how to customize this file, please see http://wiki.apache.org/solr/SchemaXml -- schema name=enr-solr version=1.1 !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.1 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default -- types !-- field type definitions. The name attribute is just a label to be used by field definitions. The class attribute and any other attributes determine the real behavior of the fieldtype. Class names starting with solr refer to java classes in the org.apache.solr.analysis package. -- !-- The StrField type is not analyzed, but indexed/stored verbatim. - StrField and TextField support an optional compressThreshold which limits compression (if enabled in the derived fields) to values which exceed a certain size (in characters). -- fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- boolean type: true or false -- fieldtype name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ !-- The optional sortMissingLast and sortMissingFirst attributes are currently supported on types that are sorted internally as strings. - If sortMissingLast=true, then a sort on this field will cause documents without the field to come after documents with the field, regardless of the requested sort order (asc or desc). - If sortMissingFirst=true, then a sort on this field will cause documents without the field to come before documents with the field, regardless of the requested sort order. - If sortMissingLast=false and sortMissingFirst=false (the default), then default lucene sorting will be used which places docs without the field first in an ascending sort and last in a descending sort. -- !-- numeric field types that store and index the text value verbatim (and hence don't support range queries, since the lexicographic ordering isn't equal to the numeric ordering) -- fieldtype name=integer class=solr.IntField omitNorms=true/ fieldtype name=long class=solr.LongField omitNorms=true/ fieldtype name=float class=solr.FloatField omitNorms=true/ fieldtype name=double class=solr.DoubleField omitNorms=true/ !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldtype name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldtype name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldtype name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldtype name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ !-- The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime The trailing Z designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Expressions can also be used to denote calculations that should be performed
Selecting Distinct values?
Hi there. Is there a query I can use to select distinct values in an index? I thought I could use a facet, but the facets don't seem to return all the distinct values in the index, only the highest-count ones. Is there another query I can try? Or, can I adjust the facets somehow to make this work? Thanks, DW
RE: Selecting Distinct values?
grin Silly me. Thanks! -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, September 27, 2007 4:46 PM To: solr-user@lucene.apache.org Subject: Re: Selecting Distinct values? On 27-Sep-07, at 12:01 PM, David Whalen wrote: Hi there. Is there a query I can use to select distinct values in an index? I thought I could use a facet, but the facets don't seem to return all the distinct values in the index, only the highest-count ones. Is there another query I can try? Or, can I adjust the facets somehow to make this work? http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106 7d007d3fb66f07a3e90e9b1704cbc59a3 cheers, -Mike
quirks with sorting
Hi All. I'm seeing a weird problem with sorting that I can't figure out. I have a query that uses two fields -- a source column and a date column. I search on the source and I sort by the date descending. What I'm seeing is that depending on the value in the source, the date sort works in reverse. For example, the query: content_source:(mv); content_date desc returns 2007-09-10T09:25:00.000Z in its first row, which is what I expect. BUT, the query: content_source:(thomson); content_date desc returns 2008-08-17T00:00:00.000Z, which is the first date we put into SOLR. So, simply by changing the value in the field, the sort seems to beem reversed (or ignored outright). Now, before you ask, I did a sanity-check query to make sure that there is in fact data for that source from today, and there is. Can anyone help shed some light on this? TIA DW
RE: quirks with sorting
red-faced You know, I must have looked at that date 10 times and I never noticed the year. Sorry everyone! /red-faced -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, September 10, 2007 11:23 AM To: solr-user@lucene.apache.org Subject: Re: quirks with sorting On 9/10/07, David Whalen [EMAIL PROTECTED] wrote: I'm seeing a weird problem with sorting that I can't figure out. I have a query that uses two fields -- a source column and a date column. I search on the source and I sort by the date descending. What I'm seeing is that depending on the value in the source, the date sort works in reverse. For example, the query: content_source:(mv); content_date desc returns 2007-09-10T09:25:00.000Z in its first row, which is what I expect. BUT, the query: content_source:(thomson); content_date desc returns 2008-08-17T00:00:00.000Z, which is the first date we put into SOLR. It is it the last (highest date) since it's 2008? -Yonik
searching where a value is not null?
Hi all. I'm trying to construct a query that in pseudo-code would read like this: field != '' I'm finding it difficult to write this as a solr query, though. Stuff like: NOT field:() doesn't seem to do the trick. any ideas? dw
Effects of changing schema?
Hi All. I'm unclear on whether changing the schema.xml file automatically causes a reindex or not. If I'm adding a field to the schema (and removing some unused ones), does solr do the reindex? Or, do I have to kick it off myself. Ideally, we'd like to avoid a reindex... Thanks! DW
Problem with stemming
Hi All. We're running into a problem with stemming that I can't figure out. For example, searching for the word transit (whether in quotes or not) returns documents with the word transition in them. How do I disable this? We want our engine to be as literal as possible. If a user mis-types a word, that's too bad for them TIA DW
RE: Problem with stemming
Yonik: I only raised the question to the group after I had looked in the schema.xml. There are a lot of comments in that file, but they make no sense to me. I'd appreciate some specific help on what to do... DW -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, August 13, 2007 3:28 PM To: solr-user@lucene.apache.org Subject: Re: Problem with stemming On 8/13/07, David Whalen [EMAIL PROTECTED] wrote: Hi All. We're running into a problem with stemming that I can't figure out. For example, searching for the word transit (whether in quotes or not) returns documents with the word transition in them. How do I disable this? We want our engine to be as literal as possible. If a user mis-types a word, that's too bad for them Use a different field-type for those fields that you want exact matching for (and then re-index). Read through schema.xml if you haven't... there are quite a few comments in there. You may want a field type with just a whitespace tokenizer followed by a lowercase filter. -Yonik
RE: Problem with stemming
Thanks, guys. I'm sure that by the time I get the book and learn all about Lucene the CEO of my company will have insisted we find another search engine. But the book will look great on my coffee table -Original Message- From: Lance Norskog [mailto:[EMAIL PROTECTED] Sent: Monday, August 13, 2007 4:37 PM To: solr-user@lucene.apache.org Subject: RE: Problem with stemming (Oops, try again.) You need this book: http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281 /ref=pd_bbs_sr _1/103-4871137-7111056?ie=UTF8s=booksqid=1187037246sr=8-1 Lucene in Action by Eric Hatcher and Otis Gospodnetic. It does not cover Solr really, but you will understand what Lucene does and how it works. Until then you will not really get anywhere. Cheers, Lance -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Monday, August 13, 2007 1:00 PM To: solr-user@lucene.apache.org Subject: RE: Problem with stemming Yonik: I only raised the question to the group after I had looked in the schema.xml. There are a lot of comments in that file, but they make no sense to me. I'd appreciate some specific help on what to do... DW -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, August 13, 2007 3:28 PM To: solr-user@lucene.apache.org Subject: Re: Problem with stemming On 8/13/07, David Whalen [EMAIL PROTECTED] wrote: Hi All. We're running into a problem with stemming that I can't figure out. For example, searching for the word transit (whether in quotes or not) returns documents with the word transition in them. How do I disable this? We want our engine to be as literal as possible. If a user mis-types a word, that's too bad for them Use a different field-type for those fields that you want exact matching for (and then re-index). Read through schema.xml if you haven't... there are quite a few comments in there. You may want a field type with just a whitespace tokenizer followed by a lowercase filter. -Yonik
RE: Problem with stemming
So I shut it off by removing these tags from my schema.xml file? Seems like it's this Porter thing that's messing me up. -Original Message- From: Tom Mastre [mailto:[EMAIL PROTECTED] Sent: Monday, August 13, 2007 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Problem with stemming Go here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?hi ghlight=%28stemming%29 #head-88cc86e4432b359030cffdb32d095062b843d4f5 Look for this solr.PorterStemFilterFactory On 8/13/07 1:50 PM, David Whalen [EMAIL PROTECTED] wrote: Thanks, guys. I'm sure that by the time I get the book and learn all about Lucene the CEO of my company will have insisted we find another search engine. But the book will look great on my coffee table -Original Message- From: Lance Norskog [mailto:[EMAIL PROTECTED] Sent: Monday, August 13, 2007 4:37 PM To: solr-user@lucene.apache.org Subject: RE: Problem with stemming (Oops, try again.) You need this book: http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281 /ref=pd_bbs_sr _1/103-4871137-7111056?ie=UTF8s=booksqid=1187037246sr=8-1 Lucene in Action by Eric Hatcher and Otis Gospodnetic. It does not cover Solr really, but you will understand what Lucene does and how it works. Until then you will not really get anywhere. Cheers, Lance -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Monday, August 13, 2007 1:00 PM To: solr-user@lucene.apache.org Subject: RE: Problem with stemming Yonik: I only raised the question to the group after I had looked in the schema.xml. There are a lot of comments in that file, but they make no sense to me. I'd appreciate some specific help on what to do... DW -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, August 13, 2007 3:28 PM To: solr-user@lucene.apache.org Subject: Re: Problem with stemming On 8/13/07, David Whalen [EMAIL PROTECTED] wrote: Hi All. We're running into a problem with stemming that I can't figure out. For example, searching for the word transit (whether in quotes or not) returns documents with the word transition in them. How do I disable this? We want our engine to be as literal as possible. If a user mis-types a word, that's too bad for them Use a different field-type for those fields that you want exact matching for (and then re-index). Read through schema.xml if you haven't... there are quite a few comments in there. You may want a field type with just a whitespace tokenizer followed by a lowercase filter. -Yonik Thomas M. Mastre Manager, Homeland Security Digital Library Center for Homeland Defense and Security The Nation's Homeland Security Educator 1 University Circle DKL, Room 112 Monterey, Ca. 93943 Phone: 831.656.1095, Cell:831.238.1451 Fax:831.656.2619 email: [EMAIL PROTECTED] http://www.hsdl.org
RE: Any clever ideas to inject into solr? Without http?
What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the HTTP daemon to only handle search requests, not indexing requests on top of them. Plus, I have to believe there's a faster way to get documents into solr/lucene than using curl _ david whalen senior applications developer eNR Services, Inc. [EMAIL PROTECTED] 203-849-7240 -Original Message- From: Clay Webster [mailto:[EMAIL PROTECTED] Sent: Thursday, August 09, 2007 11:43 AM To: solr-user@lucene.apache.org Subject: Re: Any clever ideas to inject into solr? Without http? Condensing the loader into a single executable sounds right if you have performance problems. ;-) You could also try adding multiple docs in a single post if you notice your problems are with tcp setup time, though if you're doing localhost connections that should be minimal. If you're already local to the solr server, you might check out the CSV slurper. http://wiki.apache.org/solr/UpdateCSV It's a little specialized. And then there's of course the question of are you doing full re-indexing or incremental indexing of changes? --cw On 8/9/07, Kevin Holmes [EMAIL PROTECTED] wrote: I inherited an existing (working) solr indexing script that runs like this: Python script queries the mysql DB then calls bash script Bash script performs a curl POST submit to solr We're injecting about 1000 records / minute (constantly), frequently pushing the edge of our CPU / RAM limitations. I'm in the process of building a Perl script to use DBI and lwp::simple::post that will perform this all from a single script (instead of 3). Two specific questions 1: Does anyone have a clever (or better) way to perform this process efficiently? 2: Is there a way to inject into solr without using POST / curl / http? Admittedly, I'm no solr expert - I'm starting from someone else's setup, trying to reverse-engineer my way out. Any input would be greatly appreciated.
RE: Solr 1.1 HTTP server stops responding
Hi All. I'm still hoping to get some insight into how I can solve this issue. If Jetty is the problem I'll happily get rid of it, but I'd feel better if I could do some tests first to be sure I'm solving the problem. Has anyone else had this problem in the past? Thanks, DW -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:49 AM To: solr-user@lucene.apache.org Subject: RE: Solr 1.1 HTTP server stops responding We're using Jetty. I don't know what version though. To my knowledge, Solr is the only thing running inside it. Yes, we cannot get to the admin pages either. Nothing on port 8983 responds. So maybe it's actually Jetty that's messing me up? How can I make sure of that? Thanks for the help! DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:40 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Solr runs as a webapp (think .war file) inside a servlet container (e.g. Tomcat, Jetty, Resin...). It could be that the servlet contan itself has a bug that prevents it from responding properly after a while. If you have other webapps in the same container, do they still respond? Can you got to *any* of Solr's pages (e.g. admin page)? Anything in container or Solr logs? Otis -- Lucene Consulting - http://lucene-consulting.com/ - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 4:21:18 PM Subject: RE: Solr 1.1 HTTP server stops responding Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Hi David, Have you ruled out your servlet container as the source of this bug? Otis - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 3:06:42 PM Subject: Solr 1.1 HTTP server stops responding Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. We stop and restart Solr and everything works fine for a few hours, and then the problem returns. We can't seem to point to any single factor that would lead to this problem, and I'm hoping to get some hints on how to diagnose it. Here's what I can tell you now, and I can provide more info by request: 1) The query load (via /solr/select) isn't that high. Maybe 20 or 30 requests per minute tops. 2) The insert load (via /solr/update) is very high. We commit almost 500,000 documents per day. We also trim out the same number however, so the net number of documents should stay around 20 million. 3) We do see Out of Memory errors sometimes, especially when making facet queries (which we do most of the time). We think solr is great, and we want to keep using it, but the downtime makes the product (and us) look bad, so we need to solve this soon. Thanks in advance for your help! DW No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM
Please help! Solr 1.1 HTTP server stops responding
Guys: Can anyone help me? Things are getting serious at my company and heads are going to roll. I need to figure out why solr just suddenly stops responding without any warning. DW -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:49 AM To: solr-user@lucene.apache.org Subject: RE: Solr 1.1 HTTP server stops responding We're using Jetty. I don't know what version though. To my knowledge, Solr is the only thing running inside it. Yes, we cannot get to the admin pages either. Nothing on port 8983 responds. So maybe it's actually Jetty that's messing me up? How can I make sure of that? Thanks for the help! DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:40 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Solr runs as a webapp (think .war file) inside a servlet container (e.g. Tomcat, Jetty, Resin...). It could be that the servlet contan itself has a bug that prevents it from responding properly after a while. If you have other webapps in the same container, do they still respond? Can you got to *any* of Solr's pages (e.g. admin page)? Anything in container or Solr logs? Otis -- Lucene Consulting - http://lucene-consulting.com/ - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 4:21:18 PM Subject: RE: Solr 1.1 HTTP server stops responding Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Hi David, Have you ruled out your servlet container as the source of this bug? Otis - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 3:06:42 PM Subject: Solr 1.1 HTTP server stops responding Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. We stop and restart Solr and everything works fine for a few hours, and then the problem returns. We can't seem to point to any single factor that would lead to this problem, and I'm hoping to get some hints on how to diagnose it. Here's what I can tell you now, and I can provide more info by request: 1) The query load (via /solr/select) isn't that high. Maybe 20 or 30 requests per minute tops. 2) The insert load (via /solr/update) is very high. We commit almost 500,000 documents per day. We also trim out the same number however, so the net number of documents should stay around 20 million. 3) We do see Out of Memory errors sometimes, especially when making facet queries (which we do most of the time). We think solr is great, and we want to keep using it, but the downtime makes the product (and us) look bad, so we need to solve this soon. Thanks in advance for your help! DW No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM
RE: Please help! Solr 1.1 HTTP server stops responding
Hi Yonik! I'm glad to finally get to talk to you. We're all very impressed with solr and when it's running it's really great. We increased the heap size to 1500M and that didn't seem to help. In fact, the crashes seem to occur more now than ever. We're constantly restarting solr just to get a response. I don't know enough to know where the log files are to answer your question (again, I'm filling in for the guy that set us up with all this). Can I ask for your patience so we can figure this out? Thanks! Dave W -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, July 30, 2007 2:23 PM To: solr-user@lucene.apache.org Subject: Re: Please help! Solr 1.1 HTTP server stops responding It may be related to the out-of-memory errors you were seeing. severe errors like that should never be ignored. Do you see any other warning or severe errors in your logs? -Yonik On 7/30/07, David Whalen [EMAIL PROTECTED] wrote: Guys: Can anyone help me? Things are getting serious at my company and heads are going to roll. I need to figure out why solr just suddenly stops responding without any warning. DW -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:49 AM To: solr-user@lucene.apache.org Subject: RE: Solr 1.1 HTTP server stops responding We're using Jetty. I don't know what version though. To my knowledge, Solr is the only thing running inside it. Yes, we cannot get to the admin pages either. Nothing on port 8983 responds. So maybe it's actually Jetty that's messing me up? How can I make sure of that? Thanks for the help! DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:40 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Solr runs as a webapp (think .war file) inside a servlet container (e.g. Tomcat, Jetty, Resin...). It could be that the servlet contan itself has a bug that prevents it from responding properly after a while. If you have other webapps in the same container, do they still respond? Can you got to *any* of Solr's pages (e.g. admin page)? Anything in container or Solr logs? Otis -- Lucene Consulting - http://lucene-consulting.com/ - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 4:21:18 PM Subject: RE: Solr 1.1 HTTP server stops responding Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Hi David, Have you ruled out your servlet container as the source of this bug? Otis - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 3:06:42 PM Subject: Solr 1.1 HTTP server stops responding Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. We stop and restart Solr and everything works fine for a few hours, and then the problem returns. We can't seem to point to any single factor that would lead to this problem, and I'm hoping to get some hints on how to diagnose it. Here's what I can tell you now, and I can provide more info by request: 1) The query load (via /solr/select) isn't that high. Maybe 20 or 30 requests per minute tops. 2) The insert load (via /solr/update) is very high. We commit almost 500,000 documents per day. We also trim out the same number however, so the net number of documents should stay around 20 million. 3) We do see Out of Memory errors sometimes, especially when making facet queries (which we do most of the time). We think solr is great, and we want to keep using it, but the downtime makes the product (and us) look bad, so we need to solve this soon. Thanks in advance for your help! DW No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM No virus found
RE: Please help! Solr 1.1 HTTP server stops responding
Yonik: If that's not the problem, you could decrease memory usage due to faceting by upgrading to Solr 1.2 and using facet.enum.cache.minDf Is it hard to upgrade from 1.1 to 1.2? We were considering making that change if it wouldn't cost us a lot of downtime. can you help me understand what using facet.enum.cache.minDf means? Is that a setting in the config file? Dave W -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, July 30, 2007 3:29 PM To: solr-user@lucene.apache.org Subject: Re: Please help! Solr 1.1 HTTP server stops responding Grep for PERFORMANCE in the logs to make sure that you aren't running into a scenario where more than one searcher is warming in the background. If that's not the problem, you could decrease memory usage due to faceting by upgrading to Solr 1.2 and using facet.enum.cache.minDf -Yonik On 7/30/07, Kevin Holmes [EMAIL PROTECTED] wrote: Just got this: Jul 30, 2007 3:02:14 PM org.apache.solr.core.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Jul 30, 2007 3:02:30 PM org.apache.solr.core.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Kevin Holmes eNR Services, Inc. 20 Glover Ave. 2nd Floor Norwalk, CT. 06851 203-849-7248 [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, July 30, 2007 2:55 PM To: solr-user@lucene.apache.org Subject: Re: Please help! Solr 1.1 HTTP server stops responding On 7/30/07, David Whalen [EMAIL PROTECTED] wrote: We increased the heap size to 1500M and that didn't seem to help. In fact, the crashes seem to occur more now than ever. We're constantly restarting solr just to get a response. I don't know enough to know where the log files are to answer your question Me neither ;-) Solr's example app that uses Jetty just has logging going to stdout (the console) to make it clear and visible to new users when an error happens. Hopefully you've configured Jetty to log to files, or at least redirected Jetty's stdout/stderr to a file. You need to look around and try and find those log files. If you find them, one thing to look for would be WARNING in the log files. Another thing to look for would be Exception or Memory So maybe it's actually Jetty that's messing me up? How can I make sure of that? Perhaps point your browser at http://localhost:8983/ and see if you get any reponse at all. -Yonik No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.25/926 - Release Date: 7/29/2007 11:14 PM
faceting on multiple columns
Hi All. I am using facets to help me build an ajax-driven tree for search results. When the search is first run, all I need to do is show the counts per facet, for example search results for fred +--A (102) +--B (234) +--C (721) +--D (512) sounds simple, but I also need to break-down the results from D by a different index in lucene: search results for fred +--A (102) +--B (234) +--C (721) +--D (512) +--D1 (19) +--D2 (34) +--D3 (45) what I have been doing in my solr querystring looks like this: rows=0facet=truefacet.limit=-1facet.field=field1facet.field=field 2 Unfortunately we're seeing really bad performance and we're constantly running out of heap space on this type of query. So, my question is, would breaking this into two calls perform better? That is, rows=0facet=truefacet.limit=-1facet.field=field1 and then rows=0facet=truefacet.limit=-1facet.field=field2 ? It seems to me that two calls would have more overhead than one, but it might lessen the impact on the heap space on my server. Anyone work enough with facets to throw in their two cents? Thanks! Dave W.
RE: Solr 1.1 HTTP server stops responding
Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Hi David, Have you ruled out your servlet container as the source of this bug? Otis - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 3:06:42 PM Subject: Solr 1.1 HTTP server stops responding Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. We stop and restart Solr and everything works fine for a few hours, and then the problem returns. We can't seem to point to any single factor that would lead to this problem, and I'm hoping to get some hints on how to diagnose it. Here's what I can tell you now, and I can provide more info by request: 1) The query load (via /solr/select) isn't that high. Maybe 20 or 30 requests per minute tops. 2) The insert load (via /solr/update) is very high. We commit almost 500,000 documents per day. We also trim out the same number however, so the net number of documents should stay around 20 million. 3) We do see Out of Memory errors sometimes, especially when making facet queries (which we do most of the time). We think solr is great, and we want to keep using it, but the downtime makes the product (and us) look bad, so we need to solve this soon. Thanks in advance for your help! DW No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM
RE: Solr 1.1 HTTP server stops responding
We're using Jetty. I don't know what version though. To my knowledge, Solr is the only thing running inside it. Yes, we cannot get to the admin pages either. Nothing on port 8983 responds. So maybe it's actually Jetty that's messing me up? How can I make sure of that? Thanks for the help! DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:40 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Solr runs as a webapp (think .war file) inside a servlet container (e.g. Tomcat, Jetty, Resin...). It could be that the servlet contan itself has a bug that prevents it from responding properly after a while. If you have other webapps in the same container, do they still respond? Can you got to *any* of Solr's pages (e.g. admin page)? Anything in container or Solr logs? Otis -- Lucene Consulting - http://lucene-consulting.com/ - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 4:21:18 PM Subject: RE: Solr 1.1 HTTP server stops responding Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Hi David, Have you ruled out your servlet container as the source of this bug? Otis - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 3:06:42 PM Subject: Solr 1.1 HTTP server stops responding Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. We stop and restart Solr and everything works fine for a few hours, and then the problem returns. We can't seem to point to any single factor that would lead to this problem, and I'm hoping to get some hints on how to diagnose it. Here's what I can tell you now, and I can provide more info by request: 1) The query load (via /solr/select) isn't that high. Maybe 20 or 30 requests per minute tops. 2) The insert load (via /solr/update) is very high. We commit almost 500,000 documents per day. We also trim out the same number however, so the net number of documents should stay around 20 million. 3) We do see Out of Memory errors sometimes, especially when making facet queries (which we do most of the time). We think solr is great, and we want to keep using it, but the downtime makes the product (and us) look bad, so we need to solve this soon. Thanks in advance for your help! DW No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.476 / Virus Database: 269.10.22/922 - Release Date: 7/27/2007 6:08 AM