Re: Retrieve time of last optimize
I don't think there is anything low level in Lucene that will specifically output anything like lastOptimized() to you, since it can be setup a few ways. Your best bet is probably adding a postOptimize hook and dumping it to log / file / monitor / etc, probably something like ... listener event=postCommit class=solr.RunExecutableListener str name=exelastOptimize.sh/str str name=dirsolr/bin/str bool name=waittrue/bool /listener Or writing to a file and reading it back into the admin if you need to display it there. More @ http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section - Jon On Apr 22, 2010, at 11:16 AM, Shawn Heisey wrote: On 4/21/2010 1:24 PM, Shawn Heisey wrote: Is it possible to issue some kind of query to a Solr core that will return the last time the index was optimized? Every day, one of my shards should get optimized, so I would like my monitoring system to tell me when the newest optimize date is more than 24 hours ago. I could not find a way to get this. The /admin/cores page has a lot of other useful information, but not that particular piece. I have found some other useful information on the stats.jsp page, like the number of segments, the size of the index on disk, and so on. Still have not been able to locate the last optimize date, which would simply be the timestamp on the earliest disk segment. Thanks, Shawn
Re: Solr full-import not working as expected
Saratv, is there any unique-ID (defined in your schema.xml) that may be duplicate? - Mitch saratv wrote: I am trying to use DIH (where database has around 93k rows..from different tables), and when i ran full import few times, only 91k documents were indexed (not sure why and what documents were unindexed)..is there a way to find what went wrong as i am unable to see any errors in log files. Also is there a way to fix the problem and get all of those 93k docs (also i checked the database and saw there are no duplicates). Please respond me if anyone has seen a similar behaviour. Appreciate your input. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-full-import-not-working-as-expected-tp744937p745102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to prevent this search lockup (apparently caused during big segment merges)?
I don't know much about how Solr does its locking, so I'm guessing below: It looks like one thread is doing a commit, by closing the writer, and is likely holding a lock that prevents other (add/delete) ops from running? Probably this lock is held because the writer is in the process of being closed, and on close, the write waits for running merges to complete, so it can take a very long time if a large merge is running. And then your while loop keeps using up another of the 200 threads, blocking on the add/delete request. I think Solr could, instead, call IndexWriter.finishMerges, without holding the lock, and then perhaps IndexWriter.close(false), which would be fast (ie, aborts any running merges, for the race condition where another merge just started after finishMerges and before close). Alternatively, Solr could call IndexWriter.commit, not IndexWriter.close, and not hold the lock that prevents add/deletes (but maybe there are other reasons why the IW must be closed?). Maybe Solr should also have a way to restrict the max # threads to be used for pending add/delete ops, so that there are always thread free in the app server's pool for searching? Or... maybe you could drastically increase the timeout on your client side HTTP connections? Or, is there some way to check how many threads are tied up in Solr and block your add/delete requests when this gets too large...? Mike On Thu, Apr 22, 2010 at 6:28 PM, Chris Harris rygu...@gmail.com wrote: I'm running Solr 1.4+ under Tomcat 6, with indexing and searching requests simultaneously hitting the same Solr machine. Sometimes Solr, Tomcat, and my (C#) indexing process conspire to render search inoperable. So far I've only noticed this while big segment merges (i.e. merges that take multiple minutes) are taking place. Let me explain the situation as best as I understand it. My indexer has a main loop that looks roughly like this: while true: try: submit a new add or delete request to Solr via HTTP catch timeoutException: sleep a few seconds When things are going wrong (i.e., when a large segment merge is happening), this loop is problematic: * When the indexer's request hits Solr, then the corresponding thread in Tomcat blocks. (It looks to me like the thread is destined to block until the entire merge is complete. I'll paste in what the Java stack traces look like at the end of the message if they can help diagnose things.) * Because the Solr thread stays blocked for so long, eventually the indexer hits a timeoutException. (That is, it gives up on Solr.) * Hitting the timeout exception doesn't cause the corresponding Tomcat thread to die or unblock. Therefore, each time through the loop, another Solr-handling thread inside Tomcat enters a blocked state. * Eventually so many threads (maxThreads, whose Tomcat default is 200) are blocked that Tomcat starts rejecting all new Solr HTTP requests -- including those coming in from the web tier. * Users are unable to search. The problem might self-correct once the merge is complete, but that could be quite a while. What are my options for changing Solr settings or changing my indexing process to avoid this lockup scenario? Do you agree that the segment merge is helping cause the lockup? Do adds and deletes really need to block on segment merges? Partial thread dumps follow, showing example add and delete threads that are blocked. Also the active Lucene Merge Thread, and the thread that kicked off the merge. [doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock() to return] http-1800-200 daemon prio=6 tid=0x0a58cc00 nid=0x1028 waiting on condition [0x0f9ae000..0x0f9afa90] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00016d801ae0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown Source) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71) at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at
Questions on autocommit and optimize operations
Hi Solr Gurus We are thinking about optimizing our production master slave solr setup, just wanted to poll the group on following questions: 1. Currently we are using autocommit feature with setting of 50 docs and 5 mins. Now the requirement is to reduce this time. So we are analyzing the situation where we will use the time based feature of autocommit. The time to autocommit will be *1 min*. Can anyone think of any disadvantages this change can have on index? Is it possible that autocommit process itself takes more that 1 min? 2. We want to trace the average time it takes to perform commit operation. Right now on production we have Lucid Solr 1.4 on master/slaves but we are still using old script based replication method. But we will be moving to new JAVA based replication soon, hence want to focus more on autocommit and the time it takes to commit the data. So, how to trace back the logs of autocommit? Does autocommit executes commit script present under bin folder? 3. What should be the optimum time for optimizing the data? After going through some posts like - http://www.mail-archive.com/solr-user@lucene.apache.org/msg10920.html. it makes sense to optimize the data infrequently. How to configure this in 1.4? Currently we optimize using optimize script twice a day. Also, can there be a situation where the optimize can conflict with commit operation? If yes, then how to avoid such kind of situation. Many Thanks Regards Dipti Khullar
Problem with pdf, upgrading Cell
Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer version so i checked out the trunk, and built it using ant dist, and ant example.I then set up my schema in the newly built server, and inserted the library from the newly built cell into the lib directory (in solr's home). However, now all i get is a blank response... The indexing works, but it doesn't extract anything, only the literal values that i pass on are indexed. Any help would be greatly appreciated!! :) Thank you. Marc Ghorayeb _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail
Re: Problem with pdf, upgrading Cell
Marc, got anything in your logs? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb dekay...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 8:42:53 AM Subject: Problem with pdf, upgrading Cell Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer version so i checked out the trunk, and built it using ant dist, and ant example.I then set up my schema in the newly built server, and inserted the library from the newly built cell into the lib directory (in solr's home). However, now all i get is a blank response... The indexing works, but it doesn't extract anything, only the literal values that i pass on are indexed. Any help would be greatly appreciated!! :) Thank you. Marc Ghorayeb _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail
RE: Problem with pdf, upgrading Cell
I'm launching it with the start.jar utility, and there doesn't seem to be anything weird inside the console when i upload a pdf. Is there a way to output the console to a log file? The only log file that get's updated is a log file in the logs directory, and it seems to only show the input/ouput of the web requests (get and posts...). for example:127.0.0.1 - - [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 - - [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 200 780 127.0.0.1 - - [23/Apr/2010:13:06:57 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:06:58 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 44 127.0.0.1 - - [23/Apr/2010:13:06:59 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 44 127.0.0.1 - - [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 26395 127.0.0.1 - - [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 I don't think that's going to help much :) Date: Fri, 23 Apr 2010 06:04:34 -0700 From: otis_gospodne...@yahoo.com Subject: Re: Problem with pdf, upgrading Cell To: solr-user@lucene.apache.org Marc, got anything in your logs? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb dekay...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 8:42:53 AM Subject: Problem with pdf, upgrading Cell Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer version so i checked out the trunk, and built it using ant dist, and ant example.I then set up my schema in the newly built server, and inserted the library from the newly built cell into the lib directory (in solr's home). However, now all i get is a blank response... The indexing works, but it doesn't extract anything, only the literal values that i pass on are indexed. Any help would be greatly appreciated!! :) Thank you. Marc Ghorayeb _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail _ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/
RE: Problem with pdf, upgrading Cell
Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G. From: dekay...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 15:12:39 +0200 I'm launching it with the start.jar utility, and there doesn't seem to be anything weird inside the console when i upload a pdf. Is there a way to output the console to a log file? The only log file that get's updated is a log file in the logs directory, and it seems to only show the input/ouput of the web requests (get and posts...). for example:127.0.0.1 - - [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 - - [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 200 780 127.0.0.1 - - [23/Apr/2010:13:06:57 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:06:58 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 44 127.0.0.1 - - [23/Apr/2010:13:06:59 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 44 127.0.0.1 - - [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 26395 127.0.0.1 - - [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 I don't think that's going to help much :) Date: Fri, 23 Apr 2010 06:04:34 -0700 From: otis_gospodne...@yahoo.com Subject: Re: Problem with pdf, upgrading Cell To: solr-user@lucene.apache.org Marc, got anything in your logs? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb dekay...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 8:42:53 AM Subject: Problem with pdf, upgrading Cell Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer version so i checked out the trunk, and built it using ant dist, and ant example.I then set up my schema in the newly built server, and inserted the library from the newly built cell into the lib directory (in solr's home). However, now all i get is a blank response... The indexing works, but it doesn't extract anything, only the literal values that i pass on are indexed. Any help would be greatly appreciated!! :) Thank you. Marc Ghorayeb _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail _ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/
RE: Problem with pdf, upgrading Cell
Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G. _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail
Comparing two queries
We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to tell the user that the same / a similar query are already being monitored by another user, as the users will likely have the same interests. I would therefore like to evaluate whether two queries will return (almost) the same set of results. But how can I compare two queries to determine if they will return (almost) the same set of results? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Multiple query searches in one request
Hi there, Is it possible to do a search more than once, where only the filter query changes. The response is the three different search results. We want a page which shows a clustered view of 5 of each of the three types (images, news articles, editorial articles), ordered by their score. One possibility is doing three seperate solr search requests, but its not really a neat solution. One answer could be making a custom request handler, could that be possible to solve this issue? Could you give me some pointers on how to implement one? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Comparing two queries
Hello Gert, I think you'd have to apply custom heuristics that involves looking at top N hits for each query and looking at the % overlap. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert gert.ville...@logica.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 10:20:54 AM Subject: Comparing two queries We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to tell the user that the same / a similar query are already being monitored by another user, as the users will likely have the same interests. I would therefore like to evaluate whether two queries will return (almost) the same set of results. But how can I compare two queries to determine if they will return (almost) the same set of results? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
What hardware do I need ?
Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a standard server (3ghz proc, 4gb ram) with the client application (apache + php5 + ZF + apc) and Tomcat + Solr enough ??? Do I need more hardware ? Thanks in advance, Xavier S.
Re: Problem with pdf, upgrading Cell
Marc, These are your request logs. You want to look at your Solr logs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb dekay...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 9:12:39 AM Subject: RE: Problem with pdf, upgrading Cell I'm launching it with the start.jar utility, and there doesn't seem to be anything weird inside the console when i upload a pdf. Is there a way to output the console to a log file? The only log file that get's updated is a log file in the logs directory, and it seems to only show the input/ouput of the web requests (get and posts...). for example:127.0.0.1 - - [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 - - [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 200 780 127.0.0.1 - - [23/Apr/2010:13:06:57 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:06:58 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 44 127.0.0.1 - - [23/Apr/2010:13:06:59 +] POST /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1 HTTP/1.1 200 44 127.0.0.1 - - [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 - - [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 26395 127.0.0.1 - - [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 I don't think that's going to help much :) Date: Fri, 23 Apr 2010 06:04:34 -0700 From: href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com Subject: Re: Problem with pdf, upgrading Cell To: ymailto=mailto:solr-user@lucene.apache.org; href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Marc, got anything in your logs? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb ymailto=mailto:dekay...@hotmail.com; href=mailto:dekay...@hotmail.com;dekay...@hotmail.com To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Sent: Fri, April 23, 2010 8:42:53 AM Subject: Problem with pdf, upgrading Cell Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer version so i checked out the trunk, and built it using ant dist, and ant example.I then set up my schema in the newly built server, and inserted the library from the newly built cell into the lib directory (in solr's home). However, now all i get is a blank response... The indexing works, but it doesn't extract anything, only the literal values that i pass on are indexed. Any help would be greatly appreciated!! :) Thank you. Marc Ghorayeb _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … href=http://www.messengersurvotremobile.com/?d=Hotmail; target=_blank http://www.messengersurvotremobile.com/?d=Hotmail
Merging Solr Cores Urgent
Hi, I have a Question- Merging Solr Cores The Wiki Documentation says that Merged core must exist prior to calling the merge command So I created the Merged core and pointed it to some data dir. However even after merging the cores it does still points to the old data dir Shouldn't the merge command create a new data/index or at least the contents of the merged index.? Ankit -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Solr-Cores-Urgent-tp745938p745938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple query searches in one request
Hi, Yes, a custom SearchComponent will do this. We'd done stuff like this before and actually have this sort of functionality in some of Sematext products - it works well if you don't mind writing and adding another SearchComponent to your chain. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: phoey pho...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 10:23:38 AM Subject: Multiple query searches in one request Hi there, Is it possible to do a search more than once, where only the filter query changes. The response is the three different search results. We want a page which shows a clustered view of 5 of each of the three types (images, news articles, editorial articles), ordered by their score. One possibility is doing three seperate solr search requests, but its not really a neat solution. One answer could be making a custom request handler, could that be possible to solve this issue? Could you give me some pointers on how to implement one? thanks -- View this message in context: href=http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html; target=_blank http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What hardware do I need ?
Xavier, 0-1000 QPS is a pretty wide range. Plus, it depends on how good your auto-complete is, which depends on types of queries it issues, among other things. 100K short docs is small, so that will all fit in RAM nicely, assuming those other processes leave enough RAM for the OS to cache the index. That said, you do need more than 1 box if you want your auto-complete more fault tolerant. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Xavier Schepler xavier.schep...@sciences-po.fr To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 11:01:24 AM Subject: What hardware do I need ? Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a standard server (3ghz proc, 4gb ram) with the client application (apache + php5 + ZF + apc) and Tomcat + Solr enough ??? Do I need more hardware ? Thanks in advance, Xavier S.
RE: Problem with pdf, upgrading Cell
Yes, the only log i can actually get is the one in the command console from windows and there are no errors there ... Here are the last lines when i upload a pdf to the update/extract url: Apr 23, 2010 5:47:03 PM org.apache.solr.servlet.SolrServlet initINFO: SolrServlet.init() doneApr 23, 2010 5:47:03 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=null path=null params={event=firstSearcherq=static+firstSearcher+warming+query+from+solrconfig.xml} hits=0 status=0 QTime=0Apr 23, 2010 5:47:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: JNDI not configured for solr (NoInitialContextEx)Apr 23, 2010 5:47:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: solr home defaulted to 'solr/' (could not find system property or JNDI)Apr 23, 2010 5:47:03 PM org.apache.solr.servlet.SolrUpdateServlet initINFO: SolrUpdateServlet.init() doneApr 23, 2010 5:47:03 PM org.apache.solr.core.QuerySenderListener newSearcherINFO: QuerySenderListener done.Apr 23, 2010 5:47:03 PM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener newSearcherINFO: Loading spell index for spellchecker: default2010-04-23 17:47:03.530:INFO::Opened E:\users\M1B\search\solr-new\example\logs\2010_04_23.request.log2010-04-23 17:47:03.546:INFO::Started socketconnec...@0.0.0.0:8983Apr 23, 2010 5:47:03 PM org.apache.solr.core.SolrCore registerSearcherINFO: [] Registered new searcher searc...@259a8416 mainApr 23, 2010 5:47:11 PM org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 297Apr 23, 2010 5:47:11 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr path=/update/extract params={extractOnly=trueliteral.url=http://www.3ds.com/lucidworks-solr-refguide-1.4.pdfliteral.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\lucidworks-solr-refguide-1.4.pdfliteral.type=documentliteral.appKey=medialiteral.title=lucidworks-solr-refguide-1.4.pdfwt=javabinliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bversion=1literal.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17} status=0 QTime=297 Apr 23, 2010 5:47:12 PM org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 0Apr 23, 2010 5:47:12 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr path=/update/extract params={extractOnly=trueliteral.url=http://www.3ds.com/mysql-proxy-en.pdfliteral.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\mysql-proxy-en.pdfliteral.type=documentliteral.appKey=medialiteral.title=mysql-proxy-en.pdfwt=javabinliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bversion=1literal.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17} status=0 QTime=0Apr 23, 2010 5:47:13 PM org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 0Apr 23, 2010 5:47:13 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr path=/update/extract params={extractOnly=trueliteral.url=http://www.3ds.com/python-cheat-sheet-v1.pdfliteral.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\python-cheat-sheet-v1.pdfliteral.type=documentliteral.appKey=medialiteral.title=python-cheat-sheet-v1.pdfwt=javabinliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bversion=1literal.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17} status=0 QTime=0Apr 23, 2010 5:47:14 PM org.apache.solr.update.DirectUpdateHandler2 commitINFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)Apr 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher initINFO: Opening searc...@2efeecca mainApr 23, 2010 5:47:14 PM org.apache.solr.update.DirectUpdateHandler2 commitINFO: end_commit_flushApr 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: autowarming searc...@2efeecca main from searc...@259a8416 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: autowarming result for searc...@2efeecca main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: autowarming searc...@2efeecca main from searc...@259a8416 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: autowarming result for
Re: What hardware do I need ?
Le 23/04/2010 17:08, Otis Gospodnetic a écrit : Xavier, 0-1000 QPS is a pretty wide range. Plus, it depends on how good your auto-complete is, which depends on types of queries it issues, among other things. 100K short docs is small, so that will all fit in RAM nicely, assuming those other processes leave enough RAM for the OS to cache the index. That said, you do need more than 1 box if you want your auto-complete more fault tolerant. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Xavier Scheplerxavier.schep...@sciences-po.fr To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 11:01:24 AM Subject: What hardware do I need ? Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a standard server (3ghz proc, 4gb ram) with the client application (apache + php5 + ZF + apc) and Tomcat + Solr enough ??? Do I need more hardware ? Thanks in advance, Xavier S. Well my auto-complete is built on the facet prefix search component. I think that 100-700 requests per seconds is maybe a better approximation.
Re: Problem with pdf, upgrading Cell
On Fri, Apr 23, 2010 at 5:48 PM, Marc Ghorayeb dekay...@hotmail.com wrote: Yes, the only log i can actually get is the one in the command console from windows and there are no errors there ... Here are the last lines when i upload a pdf to the update/extract url: snip I am pretty sure it is the tika itself that does not manage to convert your pdf. I'm not using solr cell but tika from a commandline, and it is only with very recent tika builds pdf extraction works in most cases. So I suggest to build tika from svn yourself, and if the commandlien extraction works, integarte it back with Solr. See http://wiki.apache.org/solr/ExtractingRequestHandler for instructions (the comitter section) hth Paul
SolrJ + BasicAuth
Uggg I just got bit hard by this on a Tomcat project ... https://issues.apache.org/jira/browse/SOLR-1238 Is there anyway to get access to that RequestEntity w/o patching? Also are there security implications w/ using the repeatable payloads? Thanks. - Jon
Re: Comparing two queries
Or, use facet.query to get the overlap. Here's ? q=query1facet=onfacet.query=query2 You'll get the hit count from query #1 in the results, and the overlapping count to query #2 in the facet query response. Erik - http://www.lucidimagination.com On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote: Hello Gert, I think you'd have to apply custom heuristics that involves looking at top N hits for each query and looking at the % overlap. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert gert.ville...@logica.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 10:20:54 AM Subject: Comparing two queries We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to tell the user that the same / a similar query are already being monitored by another user, as the users will likely have the same interests. I would therefore like to evaluate whether two queries will return (almost) the same set of results. But how can I compare two queries to determine if they will return (almost) the same set of results? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Solr does not honor facet.mincount and field.facet.mincount
Hi All, I am trying to restrict facets in solr response, by setting facet.mincount = 1, which does not work as the request and response are shown below: REQUEST: http://localhost:8983/solr/select/?q=*%3A*version=2.2rows=0start=0indent=onfacet=truefacet.field=Instrumentfacet.field=Locationfacet.mincount=9 RESPONSE: response − lst name=responseHeader int name=status0/int int name=QTime1/int − lst name=params str name=facettrue/str str name=indenton/str str name=start0/str str name=q*:*/str − arr name=facet.field strInstrument/str strLocation/str /arr str name=facet.minCount9/str str name=version2.2/str str name=rows0/str /lst /lst result name=response numFound=188 start=0/ − lst name=facet_counts lst name=facet_queries/ − lst name=facet_fields lst name=Instrument/ − lst name=Location int name=Camden, New Jersey [unconfirmed]118/int int name=Philadelphia, Pennsylvania [unconfirmed]7/int /lst /lst lst name=facet_dates/ /lst /response As we can see from the response that Instrument facet which has zero number of distinct values, is included in the response. Also facet Philadelphia, Pennsylvania [unconfirmed] which has count less than mincount (9) is included in the response. I also tried Instrument.facet.mincount=1 but still I see Instrument facet in the response. Please let me know if my understanding of mincount is different than what it is intended to do, OR if I am doing something which is not correct. Regards, Umesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-does-not-honor-facet-mincount-and-field-facet-mincount-tp746499p746499.html Sent from the Solr - User mailing list archive at Nabble.com.
Tomcat vs. WebSphere
Does anyone know of any advantages/disadvantages to running SOLR on WebSphere versus Tomcat? Thanks, Ken
Re: Tomcat vs. WebSphere
I've never used WebSphere, but I always got the impression that people have more issues with it than with simpler solutions. Personally, I would suggest Jetty. I've used it dozens of times and never had issues with it. It's small, simple, and fast. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Ken Lane (kenlane) kenl...@cisco.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 3:53:07 PM Subject: Tomcat vs. WebSphere Does anyone know of any advantages/disadvantages to running SOLR on WebSphere versus Tomcat? Thanks, Ken
Re: What hardware do I need ?
Xavier, 100-700 QPS is still high. I'm guessing your 1 box won't handle that without sweating a lot (read: slow queries). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Xavier Schepler xavier.schep...@sciences-po.fr To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 11:53:23 AM Subject: Re: What hardware do I need ? Le 23/04/2010 17:08, Otis Gospodnetic a écrit : Xavier, 0-1000 QPS is a pretty wide range. Plus, it depends on how good your auto-complete is, which depends on types of queries it issues, among other things. 100K short docs is small, so that will all fit in RAM nicely, assuming those other processes leave enough RAM for the OS to cache the index. That said, you do need more than 1 box if you want your auto-complete more fault tolerant. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Xavier Schepler ymailto=mailto:xavier.schep...@sciences-po.fr; href=mailto:xavier.schep...@sciences-po.fr;xavier.schep...@sciences-po.fr To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Sent: Fri, April 23, 2010 11:01:24 AM Subject: What hardware do I need ? Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a standard server (3ghz proc, 4gb ram) with the client application (apache + php5 + ZF + apc) and Tomcat + Solr enough ??? Do I need more hardware ? Thanks in advance, Xavier S. Well my auto-complete is built on the facet prefix search component. I think that 100-700 requests per seconds is maybe a better approximation.
Re: Best way to prevent this search lockup (apparently caused during big segment merges)?
Chris, It looks like Mike already offered several solutions though I don't know what Solr does without looking at the code. But I'm curious: * how big is your index? and do you know how large the segments being merged are? * do you batch docs or do you make use of Streaming SolrServer? I'm curious, because I've never encountered this problem before... Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Chris Harris rygu...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, April 22, 2010 6:28:29 PM Subject: Best way to prevent this search lockup (apparently caused during big segment merges)? I'm running Solr 1.4+ under Tomcat 6, with indexing and searching requests simultaneously hitting the same Solr machine. Sometimes Solr, Tomcat, and my (C#) indexing process conspire to render search inoperable. So far I've only noticed this while big segment merges (i.e. merges that take multiple minutes) are taking place. Let me explain the situation as best as I understand it. My indexer has a main loop that looks roughly like this: while true: try: submit a new add or delete request to Solr via HTTP catch timeoutException: sleep a few seconds When things are going wrong (i.e., when a large segment merge is happening), this loop is problematic: * When the indexer's request hits Solr, then the corresponding thread in Tomcat blocks. (It looks to me like the thread is destined to block until the entire merge is complete. I'll paste in what the Java stack traces look like at the end of the message if they can help diagnose things.) * Because the Solr thread stays blocked for so long, eventually the indexer hits a timeoutException. (That is, it gives up on Solr.) * Hitting the timeout exception doesn't cause the corresponding Tomcat thread to die or unblock. Therefore, each time through the loop, another Solr-handling thread inside Tomcat enters a blocked state. * Eventually so many threads (maxThreads, whose Tomcat default is 200) are blocked that Tomcat starts rejecting all new Solr HTTP requests -- including those coming in from the web tier. * Users are unable to search. The problem might self-correct once the merge is complete, but that could be quite a while. What are my options for changing Solr settings or changing my indexing process to avoid this lockup scenario? Do you agree that the segment merge is helping cause the lockup? Do adds and deletes really need to block on segment merges? Partial thread dumps follow, showing example add and delete threads that are blocked. Also the active Lucene Merge Thread, and the thread that kicked off the merge. [doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock() to return] http-1800-200 daemon prio=6 tid=0x0a58cc00 nid=0x1028 waiting on condition [0x0f9ae000..0x0f9afa90] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00016d801ae0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown Source) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71) at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at
Re: Collapse problem
: basically, we are running query with field collapsing (Solr 1.4 with : patch 236). The responses tells us that there are about 2700 documents : matching our query. However, I can not get passed the 431th document. : From this point on, the response will not contain any document. isn't that how collapse is suppose to work? a total of 2700 match, but it collapses away many of them according to some criteria, so you only paginate through 431? -Hoss
Re: Tomcat vs. WebSphere
We have run SOLR in weblogic without problems. The only change we see is some spurious extra logging info which we don¹t see in the case of tomcat. Anyone have an idea of how to control that ? Thanks Shantanu On 4/23/10 12:53 PM, Ken Lane (kenlane) kenl...@cisco.com wrote: Does anyone know of any advantages/disadvantages to running SOLR on WebSphere versus Tomcat? Thanks, Ken
RE: Comparing two queries
I was thinking along the lines 1. Retrieve the top result for one query. 2. Take the resulting document and evaluate the score that it would get in another query. 3. If the scores are similar, then the queries most likely overlap. I guess that if I had two simple query strings archive crash and query archiving failure then I could: 1. Use the query ?q=archive crashrows=1 which will return me one result (if any). 2. Read the score of the returned document. 3. Read the unique identifier field value, lets say it has field name 'URI' and value 50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'. 4. Use the query ?q=archiving failureqf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55rows=1 5. Read the score of the returned document (the document will be the same as returned under 1, the score will be different, evaluated based on the second query). 6. Evaluate how similar the scores are. My question this approach is; is the score calculated in 4 affected by the subquery, whoes role is solely to select a specific result? I'm using the dismax by the way. Should I use the standard handler instead? Would it make a difference? Thanks, Gert. From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Fri 4/23/2010 8:08 PM To: solr-user@lucene.apache.org Subject: Re: Comparing two queries Or, use facet.query to get the overlap. Here's ? q=query1facet=onfacet.query=query2 You'll get the hit count from query #1 in the results, and the overlapping count to query #2 in the facet query response. Erik - http://www.lucidimagination.com http://www.lucidimagination.com/ On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote: Hello Gert, I think you'd have to apply custom heuristics that involves looking at top N hits for each query and looking at the % overlap. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert gert.ville...@logica.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 10:20:54 AM Subject: Comparing two queries We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to tell the user that the same / a similar query are already being monitored by another user, as the users will likely have the same interests. I would therefore like to evaluate whether two queries will return (almost) the same set of results. But how can I compare two queries to determine if they will return (almost) the same set of results? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: Comparing two queries
Gert, In your second query example you used qf= Did you mean fq= ? If so, the answer is no - filter queries don't affect the score. I haven't tried your approach, but intuitively feel that looking at % overlap may work better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert gert.ville...@logica.com To: solr-user@lucene.apache.org; solr-user@lucene.apache.org Sent: Fri, April 23, 2010 5:08:04 PM Subject: RE: Comparing two queries I was thinking along the lines 1. Retrieve the top result for one query. 2. Take the resulting document and evaluate the score that it would get in another query. 3. If the scores are similar, then the queries most likely overlap. I guess that if I had two simple query strings archive crash and query archiving failure then I could: 1. Use the query ?q=archive crashrows=1 which will return me one result (if any). 2. Read the score of the returned document. 3. Read the unique identifier field value, lets say it has field name 'URI' and value 50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'. 4. Use the query ?q=archiving failureqf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55rows=1 5. Read the score of the returned document (the document will be the same as returned under 1, the score will be different, evaluated based on the second query). 6. Evaluate how similar the scores are. My question this approach is; is the score calculated in 4 affected by the subquery, whoes role is solely to select a specific result? I'm using the dismax by the way. Should I use the standard handler instead? Would it make a difference? Thanks, Gert. From: Erik Hatcher [mailto: ymailto=mailto:erik.hatc...@gmail.com; href=mailto:erik.hatc...@gmail.com;erik.hatc...@gmail.com] Sent: Fri 4/23/2010 8:08 PM To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Subject: Re: Comparing two queries Or, use facet.query to get the overlap. Here's ? q=query1facet=onfacet.query=query2 You'll get the hit count from query #1 in the results, and the overlapping count to query #2 in the facet query response. Erik - http://www.lucidimagination.com href=http://www.lucidimagination.com/; target=_blank http://www.lucidimagination.com/ On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote: Hello Gert, I think you'd have to apply custom heuristics that involves looking at top N hits for each query and looking at the % overlap. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert ymailto=mailto:gert.ville...@logica.com; href=mailto:gert.ville...@logica.com;gert.ville...@logica.com To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Sent: Fri, April 23, 2010 10:20:54 AM Subject: Comparing two queries We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to tell the user that the same / a similar query are already being monitored by another user, as the users will likely have the same interests. I would therefore like to evaluate whether two queries will return (almost) the same set of results. But how can I compare two queries to determine if they will return (almost) the same set of results? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio
RE: Comparing two queries
Yes, your solution is much simpler, providing the result through a single query. I didnt understand it the first time I read it. I guess you would need to run it backwards as well to really evaluate the relevance, i.e. First q=query1facet=onfacet.query=query2 Then q=query2facet=onfacet.query=query1 Query 1 may return 100.000 hits with 500 overlapping with query 2. This would indicate no relevance. Query 2 may return 1.000 documents with 500 overlaping with 1. This would indicate relevance. I will test it out the next days and let you know how it works for us. Regardsm Gert. From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Fri 4/23/2010 11:24 PM To: solr-user@lucene.apache.org Subject: Re: Comparing two queries Gert, In your second query example you used qf= Did you mean fq= ? If so, the answer is no - filter queries don't affect the score. I haven't tried your approach, but intuitively feel that looking at % overlap may work better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert gert.ville...@logica.com To: solr-user@lucene.apache.org; solr-user@lucene.apache.org Sent: Fri, April 23, 2010 5:08:04 PM Subject: RE: Comparing two queries I was thinking along the lines 1. Retrieve the top result for one query. 2. Take the resulting document and evaluate the score that it would get in another query. 3. If the scores are similar, then the queries most likely overlap. I guess that if I had two simple query strings archive crash and query archiving failure then I could: 1. Use the query ?q=archive crashrows=1 which will return me one result (if any). 2. Read the score of the returned document. 3. Read the unique identifier field value, lets say it has field name 'URI' and value 50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'. 4. Use the query ?q=archiving failureqf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55rows=1 5. Read the score of the returned document (the document will be the same as returned under 1, the score will be different, evaluated based on the second query). 6. Evaluate how similar the scores are. My question this approach is; is the score calculated in 4 affected by the subquery, whoes role is solely to select a specific result? I'm using the dismax by the way. Should I use the standard handler instead? Would it make a difference? Thanks, Gert. From: Erik Hatcher [mailto: ymailto=mailto:erik.hatc...@gmail.com; href=mailto:erik.hatc...@gmail.com;erik.hatc...@gmail.com] Sent: Fri 4/23/2010 8:08 PM To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Subject: Re: Comparing two queries Or, use facet.query to get the overlap. Here's ? q=query1facet=onfacet.query=query2 You'll get the hit count from query #1 in the results, and the overlapping count to query #2 in the facet query response. Erik - http://www.lucidimagination.com http://www.lucidimagination.com/ href=http://www.lucidimagination.com/; target=_blank http://www.lucidimagination.com/ On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote: Hello Gert, I think you'd have to apply custom heuristics that involves looking at top N hits for each query and looking at the % overlap. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Villemos, Gert ymailto=mailto:gert.ville...@logica.com; href=mailto:gert.ville...@logica.com;gert.ville...@logica.com To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Sent: Fri, April 23, 2010 10:20:54 AM Subject: Comparing two queries We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to tell the user that the same / a similar query are already being monitored by another user, as the users will likely have the same interests. I would therefore like to evaluate whether two queries will return (almost) the same set of results. But how can I compare two queries to determine if they will return (almost) the same set of results? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for
Re: Solr full-import not working as expected
Unfortunately you haven't answered my question, saratv. The important question is, why did your DIH-configuration not import those rows. Without providing any schema-information or configuration-details of your DIH, no one will be able to help you. Just for the future: If something don't work, present detailed information to get fast and good-quality help. Regards - Mitch saratv wrote: is there a way ..from a java program that i can find missing rows in the database...and import those rows into solr as docs...or atleast is there a way to find which rows r missing... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-full-import-not-working-as-expected-tp744937p746927.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not honor facet.mincount and field.facet.mincount
Umesh_ wrote: Hi All, I am trying to restrict facets in solr response, by setting facet.mincount = 1, which does not work as the request and response are shown below: REQUEST: http://localhost:8983/solr/select/?q=*%3A*version=2.2rows=0start=0indent=onfacet=truefacet.field=Instrumentfacet.field=Locationfacet.mincount=9 RESPONSE: response − lst name=responseHeader int name=status0/int int name=QTime1/int − lst name=params str name=facettrue/str str name=indenton/str str name=start0/str str name=q*:*/str − arr name=facet.field strInstrument/str strLocation/str /arr str name=facet.minCount9/str str name=version2.2/str str name=rows0/str /lst /lst result name=response numFound=188 start=0/ − lst name=facet_counts lst name=facet_queries/ − lst name=facet_fields lst name=Instrument/ − lst name=Location int name=Camden, New Jersey [unconfirmed]118/int int name=Philadelphia, Pennsylvania [unconfirmed]7/int /lst /lst lst name=facet_dates/ /lst /response As we can see from the response that Instrument facet which has zero number of distinct values, is included in the response. Also facet Philadelphia, Pennsylvania [unconfirmed] which has count less than mincount (9) is included in the response. The emptiness of Instrument field of the response shows that Solr couldn't facet data (9 or above docs) on the field. Regarding Location field, the result is weird. Can you show us the data and the field type of the field to reproduce the problem? I also tried Instrument.facet.mincount=1 but still I see Instrument facet in the response. Per field parameter needs f. prefix. It should be f.Instrument.face.mincount=1. Koji -- http://www.rondhuit.com/en/
Boost function on *:*
Is it possible to use boost function across the whole index/empty search term? I'm guessing the next question that would be asked is Why would you want to do that. Well with have a bunch of custom business metrics included in each document (a product). I would like to only show the best products (based on our metrics and some boost functions) in absence of a search term. Is this possible? -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-function-on-tp747131p747131.html Sent from the Solr - User mailing list archive at Nabble.com.
mix cased search terms
Hello list, first time posting here. I am trying to find an answer to a strange search behaviour we're finding in our VuFind application. In order to eliminate any VuFind related variables, I have used the vanilla Solr example schema to try our problematic search. I posted this xml to the example schema, slightly modified version of the monitor.xml for testing: adddoc field name=id1/field field name=nameIn pursuit of the PhD/field field name=manuDell, Inc./field field name=catelectronics/field field name=catmonitor/field field name=featuresIn pursuit of the PhD/field field name=includesIn pursuit of the PhD/field field name=weight401.6/field field name=price2199/field field name=popularity6/field field name=inStocktrue/field /doc/add Then run a query in the admin interface with debug on and got no match: features:PhD The debug info shows: str name=rawquerystringfeatures:PhD/str str name=querystringfeatures:PhD/str str name=parsedqueryPhraseQuery(features:ph d)/str str name=parsedquery_toStringfeatures:ph d/str But, In the analysis tool, it shows a match for the split ph d given the query term PhD. If I set the splitOnCaseChange=0 option in the WordDelimiterFilter, then a match is found as expected. I appreciate any insight on this problem. Thanks in advance. Tuan