deadlock in solrj?
Hello! I' using solrj 1.4.0 with java 1.6, on two occasions when indexing ~18000 documents we got the following problem: (trace from jconsole) Name: pool-1-thread-1 State: WAITING on java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11 e464a Total blocked: 25 Total waited: 1 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw ait(AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:25 4) org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(Stre amingUpdateSolrServer.java:196) org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr actUpdateRequest.java:105) This is the codeblock that's used for indexing public UpdateResponse indexDocuments(CollectionSolrInputDocument docs, int commitWithin){ UpdateResponse updated = null; if(docs.isEmpty()){ return null; } try { UpdateRequest req = new UpdateRequest(); req.setCommitWithin(commitWithin); req.add(docs); updated = req.process(solr); } catch (SolrServerException e) { logger.error(Error while indexing documents [+docs+], e); } catch (IOException e) { logger.error(IOException while indexing documents [+docs+],e); } return updated; } The commitWithin used in application is 1. If I'm not wrong it's a deadlock. Is this a known issue? With regards Michal Stefanczak
Re: Best way to check Solr index for completeness
How soon do you need to know? Couldn't you just regenerate the index using some kind of 'nice' factor to not use too much processor/disk/etc? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Tue, 9/28/10, dshvadskiy dshvads...@gmail.com wrote: From: dshvadskiy dshvads...@gmail.com Subject: Re: Best way to check Solr index for completeness To: solr-user@lucene.apache.org Date: Tuesday, September 28, 2010, 2:11 PM That will certainly work for most recent updates but I need to compare entire index. Dmitriy Luke Crouch wrote: Is there a 1:1 ratio of db records to solr documents? If so, couldn't you simply select the most recent updated record from the db and check to make sure the corresponding solr doc has the same timestamp? -L On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy dshvads...@gmail.comwrote: Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to date? I can use Solr fields like Id and timestamp to check against appropriate fields in database. Our index currently contains over 2 mln documents across several cores. Pulling all documents from Solr index via search (1000 docs at a time) is very slow. Is there a better way to do it? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
How long does it take to get 1000 docs? Why not ensure this while indexing? I think besides your suggestion or the suggestion of Luke there is no other way... Regards, Peter. Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to date? I can use Solr fields like Id and timestamp to check against appropriate fields in database. Our index currently contains over 2 mln documents across several cores. Pulling all documents from Solr index via search (1000 docs at a time) is very slow. Is there a better way to do it? Thanks, Dmitriy -- http://jetwick.com twitter search prototype
Re: deadlock in solrj?
This sounds like https://issues.apache.org/jira/browse/SOLR-1711. It is a known issue in Solr 1.4.0, which is apparently fixed in Solr 1.4.1. We also encountered it when indexing large numbers of documents with SolrJ, and are therefore in the process of upgrading to 1.4.1. -- Avi On Wed, Sep 29, 2010 at 8:14 AM, Michal Stefanczak michal.stefanc...@nhst.no wrote: Hello! I' using solrj 1.4.0 with java 1.6, on two occasions when indexing ~18000 documents we got the following problem: (trace from jconsole) Name: pool-1-thread-1 State: WAITING on java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11 e464a Total blocked: 25 Total waited: 1 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw ait(AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:25 4) org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(Stre amingUpdateSolrServer.java:196) org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr actUpdateRequest.java:105) This is the codeblock that's used for indexing public UpdateResponse indexDocuments(CollectionSolrInputDocument docs, int commitWithin){ UpdateResponse updated = null; if(docs.isEmpty()){ return null; } try { UpdateRequest req = new UpdateRequest(); req.setCommitWithin(commitWithin); req.add(docs); updated = req.process(solr); } catch (SolrServerException e) { logger.error(Error while indexing documents [+docs+], e); } catch (IOException e) { logger.error(IOException while indexing documents [+docs+],e); } return updated; } The commitWithin used in application is 1. If I'm not wrong it's a deadlock. Is this a known issue? With regards Michal Stefanczak
Missing facet values for zero counts
Hello list, I am implementing a directory using Solr. The user is able to search with a free-text query or 2 filters (provided as pick-lists) for country. A directory entry only has one country. I am using Solr facets for country and I use the facet counts generated initially by a *:* search to generate my pick-list. This is working fairly well but there are a couple of issues I am facing. Specifically the countries pick-list does not contain ALL possible countries. It only contains those that have been indexed against a document. I have looked at facet.missing but I cannot see how this will work - if no documents have a country of Sweden, then how would Solr know to generate a missing total of zero for Sweden - it's never heard of it. I feel I am missing something - is there a way by which you tell Solr all possible countries rather than relying on counts generated from the index? The countries in question reside in a database table belonging to our application. Thanks, Allistair
Re: Missing facet values for zero counts
Hi Allistair, On Wed, 2010-09-29 at 15:37 +0200, Allistair Crossley wrote: Hello list, I am implementing a directory using Solr. The user is able to search with a free-text query or 2 filters (provided as pick-lists) for country. A directory entry only has one country. I am using Solr facets for country and I use the facet counts generated initially by a *:* search to generate my pick-list. This is working fairly well but there are a couple of issues I am facing. Specifically the countries pick-list does not contain ALL possible countries. It only contains those that have been indexed against a document. I have looked at facet.missing but I cannot see how this will work - if no documents have a country of Sweden, then how would Solr know to generate a missing total of zero for Sweden - it's never heard of it. I feel I am missing something - is there a way by which you tell Solr all possible countries rather than relying on counts generated from the index? I don't think you are missing anything. Instead, you've described it very well: how should SOLR know of something that never made it into the index? Why not just state in the interface that for all missing countries (and deduce that from the facets and the list retrieved from the database), there are no hits. You can list those countries separately (or even add them to the facets after processing solr's result). If you do want to have them in the index, you'd have to add them by adding empty documents. But you might get into trouble with required fields etc. And you will change the statistics of the fields. Chantal
Re: Best way to check Solr index for completeness
Using TermComponent is an interesting suggestion. However my understanding it will work only for unique terms. For example compare database primary key with Solr id field. A variation of that is to calculate some kind of unique record hash and store it in the index.Then retrieve id and hash via TermComponent and compare them with hash calculated on database record. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html Sent from the Solr - User mailing list archive at Nabble.com.
How to set up multiple indexes?
I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I need to set up multicore for this. But I'm not sure how to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still pretty confused. - where do I put the 2nd index? - do I need separate schema.xml solrconfig.xml for the 2nd index? Where do I put them? - how do I tell solr which index do I want a document to go to? - how do I tell solr which index do I want to query against? - any step-by-step instruction on setting up multicore? Thanks. Andy
Re: Best way to check Solr index for completeness
Regenerating index is a slow operation due to limitation of the source systems. We run several complex SQL statements to generate 1 Solr document. Full reindex takes about 24 hours. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to set up multiple indexes?
Hi Andy! I configured this a few days ago, and found a good resource -- http://wiki.apache.org/solr/MultipleIndexes That page has links that will give you the instructions for setting up Tomcat, Jetty and Resin. I used the Tomcat ones the other day, and it gave me everything that I needed to get it up and running. You basically just need to create a new directory to contain the second instance, then create a context file for it in the TOMCAT_HOME/conf/Catalina/localhost directory. Good luck! -- Chris On Wed, Sep 29, 2010 at 10:41 AM, Andy angelf...@yahoo.com wrote: I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I need to set up multicore for this. But I'm not sure how to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still pretty confused. - where do I put the 2nd index? - do I need separate schema.xml solrconfig.xml for the 2nd index? Where do I put them? - how do I tell solr which index do I want a document to go to? - how do I tell solr which index do I want to query against? - any step-by-step instruction on setting up multicore? Thanks. Andy
Re: How to set up multiple indexes?
Check http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features It's for eZ-Find, but it's the basic setup for multiple cores in any environment. We have cores designed like so: solr/sfx/ solr/forum/ solr/mail/ solr/news/ solr/tracker/ each of those core directories has its own conf/ with schema.xml and solrconfig.xml. then solr/solr.xml looks like: cores adminPath=/admin/cores core name=sfx instanceDir=sfx / core name=tracker instanceDir=tracker / etc. After that you add the core name into the url for all requests to the core: http:///solr/sfx/select?... http:///solr/sfx/update... http:///solr/tracker/select?... http:///solr/tracker/update... On Wed, Sep 29, 2010 at 9:41 AM, Andy angelf...@yahoo.com wrote: I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I need to set up multicore for this. But I'm not sure how to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still pretty confused. - where do I put the 2nd index? - do I need separate schema.xml solrconfig.xml for the 2nd index? Where do I put them? - how do I tell solr which index do I want a document to go to? - how do I tell solr which index do I want to query against? - any step-by-step instruction on setting up multicore? Thanks. Andy
Re: Queries, Functions, and Params
On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer robert.tha...@bankserv.com wrote: On the http://wiki.apache.org/solr/FunctionQuery page, the following query function is listed: q={!func}add($v1,$v2)v1=sqrt(popularity)v2=100.0 When run against the default solr instance, server returns the error(400): undefined field $v1. Any way to remedy this? Using version: 3.1-2010-09-28_05-53-44 The wiki page indicates this is a 4.0 feature - so you need a recent 4.0-dev build to try it out. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
Swap on large memory multi-core multi-cpu NUMA
In a recent blog entry (The MySQL “swap insanity” problem and the effects of the NUMA architecture http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/), Jeremy Cole describes a particular but common problem with large memory installations of MySql on multi-core multi-cpu 64bit NUMA machines, where debilitating swapping of large amounts of memory occurs even when there is no (direct) indication of a need to swap. Without getting into the details (it involves how Linux assigns memory to the different nodes (each multi-core CPU is viewed as a 'node' in the Linux NUMA view)), the offered partial solution is to start MySql using the numactl[1] program, like: numactl --interleave all mysql I was wondering if any of the SOLR people have used this when starting up Apache (or whatever servlet engine you use for your SOLR) to reduce unnecessary swap. You probably want to be monitoring the NUMA memory hit statistics found here, with and without the numactl, while testing this: /sys/devices/system/node/node*/numastat -- Note that numactl has a number of other interesting and useful features. One that I have used is the --cpubind which restricts the number of CPUs that an application can run on. There are times when this can improve performance, such as when you have 2 demanding applications running: by assigning one to half of the CPUs and the other to the other half of the CPUs, you _can_ have improved performance due to better locality, cache hits, etc. It takes some tuning and experimentation. YMWV -Glen http://zzzoot.blogspot.com/ [1]http://linuxmanpages.com/man8/numactl.8.php -- -
Re: Best way to check Solr index for completeness
Actually retrieving 1000 docs via search isn't that bad. Turned out it takes under 1 sec. I still like the idea of using TermComponent and will use it in the future if number of docs in the index will grow. Thanks for all suggestions. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
Think about what fields you need to return. For this, you probably only need the id. That could be a lot faster than the default set of fields. wunder On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote: Actually retrieving 1000 docs via search isn't that bad. Turned out it takes under 1 sec. I still like the idea of using TermComponent and will use it in the future if number of docs in the index will grow. Thanks for all suggestions. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Is Solr right for my business situation ?
Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). Do you think having multiple indexes could be a solution for this case ?? or do I really need to spend effort in denormalizing the data ? 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... --raghav.. -Original Message- From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. It will be in 3.x, the next release The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I have nowhere near completing the whole read. The need is like this - We have a huge rdbms database/table. A single table perhaps houses 100+ million rows. Though oracle is doing a fine job of handling the insertion and updation of data, the querying is where our main concerns lie. Since we
Re: Best way to check Solr index for completeness
Yep, I was thinking of this on a uniqueKey field. I was assuming that there was a PK in the database that you were mapping to the uniqueKey field, but if that's not so then it's more of a problem. But you'd have problems anyway if you *don't* have a uniqueKey when it comes time to update any records, so it might be worth going back around and putting one in... Erick On Wed, Sep 29, 2010 at 10:40 AM, dshvadskiy dshvads...@gmail.com wrote: Using TermComponent is an interesting suggestion. However my understanding it will work only for unique terms. For example compare database primary key with Solr id field. A variation of that is to calculate some kind of unique record hash and store it in the index.Then retrieve id and hash via TermComponent and compare them with hash calculated on database record. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Solr right for my business situation ?
If at all possible, denormalize the data. Anytime you find yourself trying to make Solr behave like a database, the probability is high that you're mis-using Solr or the DB. Best Erick On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra sraghven...@corelogic.com wrote: Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). Do you think having multiple indexes could be a solution for this case ?? or do I really need to spend effort in denormalizing the data ? 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... --raghav.. -Original Message- From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. It will be in 3.x, the next release The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I
Re: Missing facet values for zero counts
I don't understand why you would want to show Sweden if it isn't in the index, what will your UI do if the user selects Sweden? However, one way to handle this would be to make a second document type. Have a field called type or some such, and make the new document type be 'dummy' or 'system' or something like that. You can put documents in here with fields for any pick-lists you want to facet on and include all possible values from your database. Do your facets on either just this doc, or all docs, either way should work. However on your search queries always include fq=-type:system basically exclude all documents of type system from all your searches. Messy, but should do what you want. -- View this message in context: http://lucene.472066.n3.nabble.com/Missing-facet-values-for-zero-counts-tp1602276p1603893.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Queries, Functions, and Params
Yes, just after sending the email I reread the wiki and noticed the 4.0 requirement. I will try that, thanks. From: ysee...@gmail.com on behalf of Yonik Seeley Sent: Wed 9/29/2010 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Queries, Functions, and Params On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer robert.tha...@bankserv.com wrote: On the http://wiki.apache.org/solr/FunctionQuery page, the following query function is listed: q={!func}add($v1,$v2)v1=sqrt(popularity)v2=100.0 When run against the default solr instance, server returns the error(400): undefined field $v1. Any way to remedy this? Using version: 3.1-2010-09-28_05-53-44 The wiki page indicates this is a 4.0 feature - so you need a recent 4.0-dev build to try it out. -Yonik http://lucenerevolution.org http://lucenerevolution.org/ Lucene/Solr Conference, Boston Oct 7-8
Re: Missing facet values for zero counts
Hi, For us this is a usability concern. You either don't show Sweden in a pick-list called Country and some users go away thinking you don't *ever* support Sweden (not true). OR you allow a user to execute an empty result search - but at least they know you do support Sweden. It is we believe undesirable for a pick-list to change from day to day as the index changes - we have a category pick-list also that acts the same. One day a user could see Productions, the next day nothing. Regular users would see this as odd. We believe that usability dictates we show all possible values and add a zero after to prevent the user executing searches but at least they see the possibilities. The best of both worlds we hope. I have solved this using earlier suggestions of merging a database list query with the Solr facet counts. I like your idea though - good thinking but the way I've done is working great also :) Thanks and best wishes, Allistair On 29 Sep 2010, at 14:08, kenf_nc wrote: I don't understand why you would want to show Sweden if it isn't in the index, what will your UI do if the user selects Sweden? However, one way to handle this would be to make a second document type. Have a field called type or some such, and make the new document type be 'dummy' or 'system' or something like that. You can put documents in here with fields for any pick-lists you want to facet on and include all possible values from your database. Do your facets on either just this doc, or all docs, either way should work. However on your search queries always include fq=-type:system basically exclude all documents of type system from all your searches. Messy, but should do what you want. -- View this message in context: http://lucene.472066.n3.nabble.com/Missing-facet-values-for-zero-counts-tp1602276p1603893.html Sent from the Solr - User mailing list archive at Nabble.com.
Issues with SolrJ and IndexReader reopening (again)
I saw there had been a previous discussion on commit failing for EmbeddedSolrServer here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html But it was never resolved. I have an embedded solr server and it does not seem to pick up changes in the index after a commit through Solrj. Looking at the logs, I can see a new searcher was opened - 20100929.141930/162 INFO [pool-1-thread-1] [] core.SolrCore - [] Registered new searcher searc...@8611b5c main 20100929.141930/162 INFO [pool-1-thread-1] [] search.SolrIndexSearcher - Closing searc...@5ff78541 main I'm using a single searcher, autowarming sizes of 0 to make sure no invalid entries get transfered over to the new searcher, I even set the httpCaching max-age=0 (i know pointless but i believe it technically is off then). Am I missing a form of caching or a configuration that will make sure this new searcher is pure or at least after time will be purified once invalid results expire? Thanks, Tony
Solr rate limiting / DoS attacks
Hi, I'm curious as to what approaches one would take to defend against users attacking a Solr service, especially if exposed to the internet as opposed to an intranet. I'm fairly new to Solr, is there anything built in? Is there anything in place to prevent the search engine from getting overwhelmed by a particular user or group of users, submitting loads of time-consuming queries as some form of a DoS attack? Additionally, is there a way of rate-limiting it so that only a certain number of queries per user/per hour can be submitted, etc? (for example, to prevent programmatic access to the search engine as opposed to a human user) Thanks, Ian
Re: Solr rate limiting / DoS attacks
This kind of thing is not limited to Solr and you normally wouldn't solve it in software - it's more a network concern. I'd be looking at a web server solution such as Apache mod_evasive combined with a good firewall for more conventional DOS attacks. Just hide your Solr install behind the firewall and communicate with it locally from your web application or whatever. Rate limiting sounds like something Solr should or could provide but I don't know the answer to that. Cheers On Sep 29, 2010, at 2:52 PM, Ian Upright wrote: Hi, I'm curious as to what approaches one would take to defend against users attacking a Solr service, especially if exposed to the internet as opposed to an intranet. I'm fairly new to Solr, is there anything built in? Is there anything in place to prevent the search engine from getting overwhelmed by a particular user or group of users, submitting loads of time-consuming queries as some form of a DoS attack? Additionally, is there a way of rate-limiting it so that only a certain number of queries per user/per hour can be submitted, etc? (for example, to prevent programmatic access to the search engine as opposed to a human user) Thanks, Ian
How to Index Pure Text into Seperate Fields?
Hi, I am using xpath to index different parts of the html pages into different fields. Now, I have some pure text documents that has no html. So I can't use xpath. How do I index these pure text into different fields of the index? How do I make nutch/solr understand these different parts belong to different fields? Maybe I can use existing content in the fields in my index? Thanks.
Re: Data Import Handler Rich Format Documents
: What's a GA release? http://en.wikipedia.org/wiki/Software_release_life_cycle#General_availability -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Dismax Request handler and Solrconfig.xml
: In Solrconfig.xml, default request handler is set to standard. I am : planning to change that to use dismax as the request handler but when I : set default=true for dismax - Solr does not return any results - I get : results only when I comment out str name=defTypedismax/str. you need to elaborate on what you mean by does not return any results ... doesn't return results for what exactly? what do your requests look like? (ie full URLs with all params) what do you expect to get back? what URLs are you using when you don't use defType=dismax? what do you get back then? not setting defType means you are getting the standard LuceneQParser instead o the DismaxQParser which means the qf param is being ignored and hte defaultSearchField is being used instead. are the terms you are searching for in your default search field but not in your title or pagedescription field? Please note these guidelines http://wiki.apache.org/solr/UsingMailingLists#Information_useful_for_searching_problems -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
terms / stemming?
Hi I issue a request like the following, in order to get a list of search-terms in a particular field: http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext But some of the terms which are returned are not quite the same as those which were indexed (or which are returned in a search). For example, my request above might return a term like famili when the indexed term was familie. Could this have something to do with stemming? If so, how do I ensure that I get the same search-terms from my terms request, as those which were indexed? Thanks, Peter
Re: terms / stemming?
Make sure your index and query analyzers are identical, and pay special attention if you're using any of the http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemminganalyzers - many of them have a number of configurable attributes that could cause differences. -L On Wed, Sep 29, 2010 at 4:42 PM, Peter A. Kirk p...@alpha-solutions.dkwrote: Hi I issue a request like the following, in order to get a list of search-terms in a particular field: http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext But some of the terms which are returned are not quite the same as those which were indexed (or which are returned in a search). For example, my request above might return a term like famili when the indexed term was familie. Could this have something to do with stemming? If so, how do I ensure that I get the same search-terms from my terms request, as those which were indexed? Thanks, Peter
Re: How to Index Pure Text into Seperate Fields?
Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're using DIH against a filesystem, you could use two fileDataSources, one that works only on files with a particular extension (xml, say) and another that processes .txt files. But that said, if you're trying to index just the text of a Word document, you have to parse it quite differently than a plain text file, take a look at Tika. Al of which may not help you at all, because I'm guessing... So I think a more complete problem statement would help us help you. Best Erick On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I am using xpath to index different parts of the html pages into different fields. Now, I have some pure text documents that has no html. So I can't use xpath. How do I index these pure text into different fields of the index? How do I make nutch/solr understand these different parts belong to different fields? Maybe I can use existing content in the fields in my index? Thanks.
Re: terms / stemming?
Yes, this is almost certainly stemming. Take a look at solr/admin, [schema browser], then click on Homefieldsyour field here. Then the index and query details link shows you exactly what's happening. You can also get some joy from the admin [analysis] page. That takes input and shows you exactly what transformations occur given your schema. Both of these are well worth taking an our to understand, it'll save you hours and hours of head- scratching. You could use copyField to copy your bodytext to a field that doesn't stem, then query the copy for the terms. HTH Erick On Wed, Sep 29, 2010 at 5:42 PM, Peter A. Kirk p...@alpha-solutions.dkwrote: Hi I issue a request like the following, in order to get a list of search-terms in a particular field: http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext But some of the terms which are returned are not quite the same as those which were indexed (or which are returned in a search). For example, my request above might return a term like famili when the indexed term was familie. Could this have something to do with stemming? If so, how do I ensure that I get the same search-terms from my terms request, as those which were indexed? Thanks, Peter
Re: How to Index Pure Text into Seperate Fields?
No, I am using xpath for html, this is not the question. I am indexing pure text in addition to html that I was indexing. Pure text like TXT file or Microsoft Word doc. So, no xpath for TXT, how do I index TXT file into different fields in my index like the way I use xpath to index html into differernt fields in my index? My question is referring to pure TXT like .txt file and microsoft word, not html. I am completely fine with html. Thanks. From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 29, 2010 2:59:26 PM Subject: Re: How to Index Pure Text into Seperate Fields? Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're using DIH against a filesystem, you could use two fileDataSources, one that works only on files with a particular extension (xml, say) and another that processes .txt files. But that said, if you're trying to index just the text of a Word document, you have to parse it quite differently than a plain text file, take a look at Tika. Al of which may not help you at all, because I'm guessing... So I think a more complete problem statement would help us help you. Best Erick On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I am using xpath to index different parts of the html pages into different fields. Now, I have some pure text documents that has no html. So I can't use xpath. How do I index these pure text into different fields of the index? How do I make nutch/solr understand these different parts belong to different fields? Maybe I can use existing content in the fields in my index? Thanks.
Memory usage
My server has 128GB of ram, the index is 22GB large. It seems the memory consumption goes up on every query and the garbage collector will never free up as much memory as I expect it to. The memory consumption looks like a curve, it eventually levels off but the old gen is always 60 or 70GB. I have tried adjusting the cache settings but it doesn't seem to make any difference. Is there something I'm doing wrong or is this expected behavior? Here is a screenshot of what I see in jconsole after running for a few minutes: http://i51.tinypic.com/2qntca1.png Here is a 24 hour period of the same data taken from a custom jmx monitor: http://i51.tinypic.com/2vcu9u8.png The server performs pretty much as good at the beginning of this cycle as it does at the end so all of this memory accumulation seems to not be doing anything useful. I am running the 1.4 war but I was having this problem with 1.3 also. Tomcat 6.0.18, Java 1.6.0. I haven't gone as far as doing any memory profiling or java debugging because I'm inexperienced, but that will be the next thing I try. Any help would be appreciated. Thanks, -Jeff
DataImportHandler dynamic fields clarification
Looking for some clarification on DIH to make sure I am interpreting this correctly. I have a wide DB table, 100 columns. I'd rather not have to add 100 values in schema.xml and data-config.xml. I was under the impression that if the column name matched a dynamic Field name, it would be added. I am not finding this is the case, but only works when the column name is explicitly listed as a static field. Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100' If I add something like: field name=column_60 type=string indexed=true stored=true/ to schema.xml, and don't reference the column in data-config entity/field tag, it gets imported, as expected. However, if I use: dynamicField name=column_* type=string indexed=true stored=true/ It does not get imported into Solr, I would expect it would. Is this the expected behavior? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with example Jetty and score problem
Does anybody can help on this ? Many thanks 2010/9/29 Floyd Wu floyd...@gmail.com Hi there I have a problem, the situation is when I issue a query to single instance, Solr response XML like following as you can see, the score is normal(float name=score value=...) === response lst name=responseHeader int name=status0/int int name=QTime23/int lst name=params str name=fl_l_title,score/str str name=start0/str str name=q_l_unique_key:12/str str name=hl.fl*/str str name=hltrue/str str name=rows999/str /lst /lst result name=response numFound=1 start=0 maxScore=1.9808292 doc float name=score1.9808292/float str name=_l_titleGTest/str /doc /result lst name=highlighting lst name=12 arr name=_l_unique_key strem12/em/str /arr /lst /lst /response === But when I issue the query with shard(two instances), the response XML will be like following. as you can see, that score has bee tranfer to a element arr of doc === response lst name=responseHeader int name=status0/int int name=QTime64/int lst name=params str name=shardslocalhost:8983/solr/core0,172.16.6.35:8983/solr/str str name=fl_l_title,score/str str name=start0/str str name=q_l_unique_key:12/str str name=hl.fl*/str str name=hltrue/str str name=rows999/str /lst /lst result name=response numFound=1 start=0 maxScore=1.9808292 doc str name=_l_titleGtest/str arr name=score float name=score1.9808292/float /arr /doc /result lst name=highlighting lst name=12 arr name=_l_unique_key strem12/em/str /arr /lst /lst /response === My Schema.xml like following field name=_l_unique_key type=string indexed=true stored=true required=true omitNorms=true/ field name=_l_read_permission type=string indexed=true stored=true omitNorms=true multiValued=true/ field name=_l_title type=text indexed=true stored=true omitNorms=false termVectors=true termPositions=true termOffsets=true/ field name=_l_summary type=text indexed=true stored=true omitNorms=false termVectors=true termPositions=true termOffsets=true/ field name=_l_body type=text indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true omitNorms=false/ dynamicField name=* type=text indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true omitNorms=false/ /fields uniqueKey_l_unique_key/uniqueKey defaultSearchField_l_body/defaultSearchField I don't really know what happended. Is my schema problem or is the behavior of Solr? please help on this.
Re: How to Index Pure Text into Seperate Fields?
Simple text .txt files and MS office .doc files are very very different beasts. You can do simple .txt files with some more lines in your DataImportHandler script. With DOC files it is easiest to use the extracting request handler */extract. This is on the wiki. If you want to do this inside the DataImporthandler, you need to use 3.x or the trunk. And it has bugs. On Wed, Sep 29, 2010 at 3:55 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: No, I am using xpath for html, this is not the question. I am indexing pure text in addition to html that I was indexing. Pure text like TXT file or Microsoft Word doc. So, no xpath for TXT, how do I index TXT file into different fields in my index like the way I use xpath to index html into differernt fields in my index? My question is referring to pure TXT like .txt file and microsoft word, not html. I am completely fine with html. Thanks. From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 29, 2010 2:59:26 PM Subject: Re: How to Index Pure Text into Seperate Fields? Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're using DIH against a filesystem, you could use two fileDataSources, one that works only on files with a particular extension (xml, say) and another that processes .txt files. But that said, if you're trying to index just the text of a Word document, you have to parse it quite differently than a plain text file, take a look at Tika. Al of which may not help you at all, because I'm guessing... So I think a more complete problem statement would help us help you. Best Erick On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I am using xpath to index different parts of the html pages into different fields. Now, I have some pure text documents that has no html. So I can't use xpath. How do I index these pure text into different fields of the index? How do I make nutch/solr understand these different parts belong to different fields? Maybe I can use existing content in the fields in my index? Thanks. -- Lance Norskog goks...@gmail.com
Re: Swap on large memory multi-core multi-cpu NUMA
This would be a Java VM option, not something Solr or other apps can know about. Using this or procset seems like a great way to handle it. On Wed, Sep 29, 2010 at 8:46 AM, Glen Newton glen.new...@gmail.com wrote: In a recent blog entry (The MySQL “swap insanity” problem and the effects of the NUMA architecture http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/), Jeremy Cole describes a particular but common problem with large memory installations of MySql on multi-core multi-cpu 64bit NUMA machines, where debilitating swapping of large amounts of memory occurs even when there is no (direct) indication of a need to swap. Without getting into the details (it involves how Linux assigns memory to the different nodes (each multi-core CPU is viewed as a 'node' in the Linux NUMA view)), the offered partial solution is to start MySql using the numactl[1] program, like: numactl --interleave all mysql I was wondering if any of the SOLR people have used this when starting up Apache (or whatever servlet engine you use for your SOLR) to reduce unnecessary swap. You probably want to be monitoring the NUMA memory hit statistics found here, with and without the numactl, while testing this: /sys/devices/system/node/node*/numastat -- Note that numactl has a number of other interesting and useful features. One that I have used is the --cpubind which restricts the number of CPUs that an application can run on. There are times when this can improve performance, such as when you have 2 demanding applications running: by assigning one to half of the CPUs and the other to the other half of the CPUs, you _can_ have improved performance due to better locality, cache hits, etc. It takes some tuning and experimentation. YMWV -Glen http://zzzoot.blogspot.com/ [1]http://linuxmanpages.com/man8/numactl.8.php -- - -- Lance Norskog goks...@gmail.com
Re: Memory usage
How many documents are there? How many unique words are in a text field? Both of these numbers can have a non-linear effect on the amount of space used. But, usually a 22Gb index (on disk) might need 6-12G of ram total. There is something odd going on here. Lance On Wed, Sep 29, 2010 at 4:34 PM, Jeff Moss jm...@heavyobjects.com wrote: My server has 128GB of ram, the index is 22GB large. It seems the memory consumption goes up on every query and the garbage collector will never free up as much memory as I expect it to. The memory consumption looks like a curve, it eventually levels off but the old gen is always 60 or 70GB. I have tried adjusting the cache settings but it doesn't seem to make any difference. Is there something I'm doing wrong or is this expected behavior? Here is a screenshot of what I see in jconsole after running for a few minutes: http://i51.tinypic.com/2qntca1.png Here is a 24 hour period of the same data taken from a custom jmx monitor: http://i51.tinypic.com/2vcu9u8.png The server performs pretty much as good at the beginning of this cycle as it does at the end so all of this memory accumulation seems to not be doing anything useful. I am running the 1.4 war but I was having this problem with 1.3 also. Tomcat 6.0.18, Java 1.6.0. I haven't gone as far as doing any memory profiling or java debugging because I'm inexperienced, but that will be the next thing I try. Any help would be appreciated. Thanks, -Jeff -- Lance Norskog goks...@gmail.com
Re: Is Solr right for my business situation ?
Some of these are big questions- try them in different emails. On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra sraghven...@corelogic.com wrote: Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). Do you think having multiple indexes could be a solution for this case ?? or do I really need to spend effort in denormalizing the data ? 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... --raghav.. -Original Message- From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. It will be in 3.x, the next release The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I have nowhere near completing the whole read. The need is like this - We have a huge rdbms
Re: Why the query performance is so different for queries?
How much ram does the JVM have? Wildcard queries are slow. Starting with '*' are even slower. If you want all values try field:[* TO *]. This is a range query and lets you pick a range of values- this picks everything. The *:* is not a wildcard. It is a magic syntax for all documents and does not cause a search. 2010/9/28 newsam new...@zju.edu.cn: Hi guys, I have posted a thread The search response time is too long. The SOLR searcher instance is deployed with Tomcat 5.5.21. . The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM. In SOLR back-end, query=key:* costs almost 60s while query=*:* only needs 500ms. Another case is query=product_name_title:*, which costs 7s. I am confused about the query performance. Do you have any suggestions? btw, the cache setting is as follows: filterCache: 256, 256, 0 queryResultCache: 1024, 512, 128 documentCache: 16384, 4096, n/a Thanks. -- Lance Norskog goks...@gmail.com
Re: Why the query performance is so different for queries?
Thanks for your reply. Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to -Xms521m -Xmx1400m. Is my understanding right? Thanks. From: Lance Norskog goks...@gmail.com Reply-To: solr-user@lucene.apache.org To: solr-user@lucene.apache.org, newsam new...@zju.edu.cn Subject: Re: Why the query performance is so different for queries? Date: Wed, 29 Sep 2010 20:13:20 -0700 How much ram does the JVM have? Wildcard queries are slow. Starting with '*' are even slower. If you want all values try field:[* TO *]. This is a range query and lets you pick a range of values- this picks everything. The *:* is not a wildcard. It is a magic syntax for all documents and does not cause a search. 2010/9/28 newsam : Hi guys, I have posted a thread The search response time is too long. The SOLR searcher instance is deployed with Tomcat 5.5.21. . The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM. In SOLR back-end, query=key:* costs almost 60s while query=*:* only needs 500ms. Another case is query=product_name_title:*, which costs 7s. I am confused about the query performance. Do you have any suggestions? btw, the cache setting is as follows: filterCache: 256, 256, 0 queryResultCache: 1024, 512, 128 documentCache: 16384, 4096, n/a Thanks. -- Lance Norskog goks...@gmail.com
Re: Why the query performance is so different for queries?
Stop running 32-bit operating systems. You'll never get good performance with a toy like that. --wunder On Sep 29, 2010, at 8:18 PM, newsam wrote: Thanks for your reply. Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to -Xms521m -Xmx1400m. Is my understanding right? Thanks. From: Lance Norskog goks...@gmail.com Reply-To: solr-user@lucene.apache.org To: solr-user@lucene.apache.org, newsam new...@zju.edu.cn Subject: Re: Why the query performance is so different for queries? Date: Wed, 29 Sep 2010 20:13:20 -0700 How much ram does the JVM have? Wildcard queries are slow. Starting with '*' are even slower. If you want all values try field:[* TO *]. This is a range query and lets you pick a range of values- this picks everything. The *:* is not a wildcard. It is a magic syntax for all documents and does not cause a search. 2010/9/28 newsam : Hi guys, I have posted a thread The search response time is too long. The SOLR searcher instance is deployed with Tomcat 5.5.21. . The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM. In SOLR back-end, query=key:* costs almost 60s while query=*:* only needs 500ms. Another case is query=product_name_title:*, which costs 7s. I am confused about the query performance. Do you have any suggestions? btw, the cache setting is as follows: filterCache: 256, 256, 0 queryResultCache: 1024, 512, 128 documentCache: 16384, 4096, n/a Thanks. -- Lance Norskog goks...@gmail.com -- Walter Underwood Venture ASM, Troop 14, Palo Alto
Where is the lock file?
Hello, We were testing nutch configurations and apparently we got heavy handed with our approach to stopping things. Now when nutch starts indexing solr, we are seeing these messages: org.apache.solr.common.SolrException: Lock obtain timed out: SingleInstanceLock: write.lock org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write .lock at org.apache.lucene.store.Lock.obtain(Lock.java:85)at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140) at org.apache.lucene.index.IndexWriter.init(IndexWrite r.java:938) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) at org.a pache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221) at org.apache.so lr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) I've looked through the configuration file. I can see where it defines the lock type and I can see the unlock configuration. But I don't see where it specifies the lock file. Where is it? What is its name? Also, to speed up nutch, we changed the configuration to start several map tasks at once. Is nutch trying to kick off several solr sessions at once and is that causing messages like the above? Should we just change the lock to simple? Thanks, Steve Cohen