Any way to get top 'n' queries searched from Solr?
Hi, I need to know what are the top (most frequently searched and their frequencies) 'n' (say 100) search queries that users tried. Does Solr keep this information and can return, or else what options do i have here? Thanks, Praveen
Re: ubuntu lucid package
On Thu, 29 Apr 2010 19:54:49 -0700 (PDT) Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Pablo, Ubuntu Lucid is *brand* new :) try: find / -name \*solr\* or locate solr.war [...] Also, the standard Debian/Ubuntu way of finding out what files a package installed is: dpkg -l pkg_name Regards, Gora
AW: Slow Date-Range Queries
For now I need them, I will however most likely (as suggested by Ahmet Arslan), create another boolean field to get rid of them, just simply due to the fact that I am switching to Solr 1.4 frange queries. On the topic of frange queries, is there a way to simulate the date range wildcards here? They don't seem to be working for the frange. Do you really need the *:* stuff in the date range subqueries? That may add to the execution time.
Re: Any way to get top 'n' queries searched from Solr?
As far as I'm aware, this information isn't stored intrinsically in Solr. We had a similar requirement whereby we need to keep track of which searches have been performed by particular users. This is more of a security audit requirement rather than generic searching, but the solution was to audit (in a SearchComponent) all users' search activity. This auditing can then be written back to the index (or, perhaps more preferably, a separate index), which can then be searched in the normal way. You could adopt the same strategy for your requirement. If you want to see how we did this, have a look at SOLR-1872. Thanks, Peter On Fri, Apr 30, 2010 at 7:14 AM, Praveen Agrawal pkal...@gmail.com wrote: Hi, I need to know what are the top (most frequently searched and their frequencies) 'n' (say 100) search queries that users tried. Does Solr keep this information and can return, or else what options do i have here? Thanks, Praveen
Re: ubuntu lucid package
http://localhost:8080/solr/admin/ gives me the solr admin. thanks On Fri, Apr 30, 2010 at 10:24 AM, Gora Mohanty g...@srijan.in wrote: On Thu, 29 Apr 2010 19:54:49 -0700 (PDT) Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Pablo, Ubuntu Lucid is *brand* new :) try: find / -name \*solr\* or locate solr.war [...] Also, the standard Debian/Ubuntu way of finding out what files a package installed is: dpkg -l pkg_name Regards, Gora
RE: Problem with pdf, upgrading Cell
Mark, did you managed to get it work? I did try latest Tika (0.7) command line and successfully parsed earlier problematic pdf. Then i replaced Tika related jars in Solr-1.4 contrib/extraction/lib folder with new ones. Now it doesn;t throw any exception, but no content extraction, only metadata! It now even doesn't extract content from pdfs which it was able to earlier (v0.4). Strange.. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-pdf-upgrading-Cell-tp745557p767447.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with pdf, upgrading Cell
I observed the same issue too, with tika 0.7 jars. It now fails to extract content from documents of any type. Works with tika 0.5 though. Thanks, Sandhya -Original Message- From: pk [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 3:17 PM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Mark, did you managed to get it work? I did try latest Tika (0.7) command line and successfully parsed earlier problematic pdf. Then i replaced Tika related jars in Solr-1.4 contrib/extraction/lib folder with new ones. Now it doesn;t throw any exception, but no content extraction, only metadata! It now even doesn't extract content from pdfs which it was able to earlier (v0.4). Strange.. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-pdf-upgrading-Cell-tp745557p767447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any way to get top 'n' queries searched from Solr?
Peter, It seems that your solution (SOLR-1872) requires authentication too (and be tracked via ur uuid), but my users will be general public using browsers, and i can't force any such auth restrictions. Also you didn't mention if you are already persisting the audit data.. Or i may need to extend it to work for my problem.. My requirement is simple: to know top n query strings with their frequencies etc.. Thanks though. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767482.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any way to get top 'n' queries searched from Solr?
The most simple way is to send the querystring to your Solr-client *and* to your custom query-fetcher, which could be any database you like. Doing so, you can count how often which query was send etc. *And* you can make them searchable by exporting those datasets to another Solr-core. Why an extra DB? Because if there occurs a crash, you got no guaranties given by Solr. Keep in mind that Solr is only an index-search-server, not a real database. This is the pretty easiest way to implement such a feature, I think. Good luck. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767489.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any way to get top 'n' queries searched from Solr?
Hi, Why you don't just create a filter in the solr context, by this way you can grasp user q param and persist it. On 4/30/10, pk pkal...@gmail.com wrote: Peter, It seems that your solution (SOLR-1872) requires authentication too (and be tracked via ur uuid), but my users will be general public using browsers, and i can't force any such auth restrictions. Also you didn't mention if you are already persisting the audit data.. Or i may need to extend it to work for my problem.. My requirement is simple: to know top n query strings with their frequencies etc.. Thanks though. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767482.html Sent from the Solr - User mailing list archive at Nabble.com. -- Abdelhamid ABID Software Engineer- J2EE / WEB
Re: Any way to get top 'n' queries searched from Solr?
Thanks Mitch.. I've an application fronting the Solr for updating/searching etc, and i'll make use of that to store this info. Thanks to all for suggestions. On Fri, Apr 30, 2010 at 3:43 PM, MitchK mitc...@web.de wrote: The most simple way is to send the querystring to your Solr-client *and* to your custom query-fetcher, which could be any database you like. Doing so, you can count how often which query was send etc. *And* you can make them searchable by exporting those datasets to another Solr-core. Why an extra DB? Because if there occurs a crash, you got no guaranties given by Solr. Keep in mind that Solr is only an index-search-server, not a real database. This is the pretty easiest way to implement such a feature, I think. Good luck. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767489.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any way to get top 'n' queries searched from Solr?
Yes, you're right, SOLR-1872 is for security authorization, and part of this is to audit what users are searching. The reference to this was to show you how your requirement can be accomplished. To have just the auditing and not the security, you'd need to create your own SearchComponent and extract out just the auditing bits, or remove the security bits. This shouldn't be too difficult to do. Or, you can simply see how it's done, and create your own SearchComponent and use the same technique. Your SearchComponent should work out a lot simpler than the one in SOLR-1872. In the case of audit persistence in SOLR-1872, audit events are written to a log file, which just happens to be monitored by an external file monitor which can feed new log entries to other sources (e.g. another index, an external log repository etc.). It's done this way to keep any external audit routing/delivery separate from the webapp (it's not part of solr.war's remit to do audit routing). For your requirement, you'll probably want to write audited searches directly into a Solr index, either the same one as is being searched or a different one (a different one is better, so your public users don't have access to your search stats). You can use any of the available /update mechanisms to accomplish this. Thanks, Peter On Fri, Apr 30, 2010 at 11:08 AM, pk pkal...@gmail.com wrote: Peter, It seems that your solution (SOLR-1872) requires authentication too (and be tracked via ur uuid), but my users will be general public using browsers, and i can't force any such auth restrictions. Also you didn't mention if you are already persisting the audit data.. Or i may need to extend it to work for my problem.. My requirement is simple: to know top n query strings with their frequencies etc.. Thanks though. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767482.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with pdf, upgrading Cell
Can you share the PDF it is failing on? FWIW, PDFs are notoriously hard to extract. They come in all shapes and flavors and I've seen many a commercial extractor fail on them too. Have you tried using either Tika standalone or PDFBox standalone? Does the file work there? On Apr 26, 2010, at 8:35 AM, Marc Ghorayeb wrote: Okay i've been digging a little bit through the Java code from the SVN, and it seems the load function inside the ExtractingDocumentLoader class does not receive the ContentStream (it is set to null...).Maybe i should send this to the developper mailing list? Marc From: dekay...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 16:03:28 +0200 Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G. _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail _ Découvrez comment SURFER DISCRETEMENT sur un site de rencontres ! http://clk.atdmt.com/FRM/go/206608211/direct/01/ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Indexing metadata in solr using ContentStreamUpdateRequest
What does your schema look like? On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote: Hello, I am using ContentStreamUpdateRequest, to index binary documents. At the time of indexing the content, I want to be able to index some additional metadata as well. I believe, this metadata must be provided, prefixed with *literal*. For instance, I have a field named “field1”, defined in schema.xml and to index a document with a value for this field, I would provide “literal.field1” = value. However, this does not seem to be working and the field defined in schema.xml, *field1*, does not have any data indexed. How can I get this working ? Thanks in advance. Thanks, Sandhya -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: ubuntu lucid package
Am 30.04.2010 um 09:24 schrieb Gora Mohanty: Also, the standard Debian/Ubuntu way of finding out what files a package installed is: dpkg -l pkg_name Regards, Gora You might try: # dpkg -L solr-common /. /etc /etc/solr /etc/solr/web.xml /etc/solr/conf /etc/solr/conf/admin-extra.html /etc/solr/conf/elevate.xml /etc/solr/conf/mapping-ISOLatin1Accent.txt /etc/solr/conf/protwords.txt /etc/solr/conf/schema.xml /etc/solr/conf/scripts.conf /etc/solr/conf/solrconfig.xml /etc/solr/conf/spellings.txt /etc/solr/conf/stopwords.txt /etc/solr/conf/synonyms.txt /etc/solr/conf/xslt /etc/solr/conf/xslt/example.xsl /etc/solr/conf/xslt/example_atom.xsl /etc/solr/conf/xslt/example_rss.xsl /etc/solr/conf/xslt/luke.xsl /usr /usr/share /usr/share/solr /usr/share/solr/WEB-INF /usr/share/solr/WEB-INF/lib /usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar /usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar /usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar /usr/share/solr/WEB-INF/weblogic.xml /usr/share/solr/scripts /usr/share/solr/scripts/abc /usr/share/solr/scripts/abo /usr/share/solr/scripts/backup /usr/share/solr/scripts/backupcleaner /usr/share/solr/scripts/commit /usr/share/solr/scripts/optimize /usr/share/solr/scripts/readercycle /usr/share/solr/scripts/rsyncd-disable /usr/share/solr/scripts/rsyncd-enable /usr/share/solr/scripts/rsyncd-start /usr/share/solr/scripts/rsyncd-stop /usr/share/solr/scripts/scripts-util /usr/share/solr/scripts/snapcleaner /usr/share/solr/scripts/snapinstaller /usr/share/solr/scripts/snappuller /usr/share/solr/scripts/snappuller-disable /usr/share/solr/scripts/snappuller-enable /usr/share/solr/scripts/snapshooter /usr/share/solr/admin /usr/share/solr/admin/_info.jsp /usr/share/solr/admin/action.jsp /usr/share/solr/admin/analysis.jsp /usr/share/solr/admin/analysis.xsl /usr/share/solr/admin/distributiondump.jsp /usr/share/solr/admin/favicon.ico /usr/share/solr/admin/form.jsp /usr/share/solr/admin/get-file.jsp /usr/share/solr/admin/get-properties.jsp /usr/share/solr/admin/header.jsp /usr/share/solr/admin/index.jsp /usr/share/solr/admin/jquery-1.2.3.min.js /usr/share/solr/admin/meta.xsl /usr/share/solr/admin/ping.jsp /usr/share/solr/admin/ping.xsl /usr/share/solr/admin/raw-schema.jsp /usr/share/solr/admin/registry.jsp /usr/share/solr/admin/registry.xsl /usr/share/solr/admin/replication /usr/share/solr/admin/replication/header.jsp /usr/share/solr/admin/replication/index.jsp /usr/share/solr/admin/schema.jsp /usr/share/solr/admin/solr-admin.css /usr/share/solr/admin/solr_small.png /usr/share/solr/admin/stats.jsp /usr/share/solr/admin/stats.xsl /usr/share/solr/admin/tabular.xsl /usr/share/solr/admin/threaddump.jsp /usr/share/solr/admin/threaddump.xsl /usr/share/solr/admin/debug.jsp /usr/share/solr/admin/dataimport.jsp /usr/share/solr/favicon.ico /usr/share/solr/index.jsp /usr/share/doc /usr/share/doc/solr-common /usr/share/doc/solr-common/changelog.Debian.gz /usr/share/doc/solr-common/README.Debian /usr/share/doc/solr-common/TODO.Debian /usr/share/doc/solr-common/copyright /usr/share/doc/solr-common/changelog.gz /usr/share/doc/solr-common/NOTICE.txt.gz /usr/share/doc/solr-common/README.txt.gz /var /var/lib /var/lib/solr /var/lib/solr/data /usr/share/solr/WEB-INF/lib/xml-apis.jar /usr/share/solr/WEB-INF/lib/xml-apis-ext.jar /usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar /usr/share/solr/WEB-INF/lib/slf4j-api.jar /usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar /usr/share/solr/WEB-INF/lib/lucene-snowball.jar /usr/share/solr/WEB-INF/lib/lucene-queries.jar /usr/share/solr/WEB-INF/lib/lucene-highlighter.jar /usr/share/solr/WEB-INF/lib/lucene-core.jar /usr/share/solr/WEB-INF/lib/lucene-analyzers.jar /usr/share/solr/WEB-INF/lib/jetty-util.jar /usr/share/solr/WEB-INF/lib/jetty.jar /usr/share/solr/WEB-INF/lib/commons-io.jar /usr/share/solr/WEB-INF/lib/commons-httpclient.jar /usr/share/solr/WEB-INF/lib/commons-fileupload.jar /usr/share/solr/WEB-INF/lib/commons-csv.jar /usr/share/solr/WEB-INF/lib/commons-codec.jar /usr/share/solr/WEB-INF/web.xml /usr/share/solr/conf If i reckon correctly some parts of apache solr will not work with the ubuntu lucid distribution. http://solr.dkd.local/update/extract throws an error: The server encountered an internal error (lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at Maybe someone from ubuntu reading this list can confirm this. Olivier -- Olivier Dobberkau d.k.d Internet Service GmbH Kaiserstraße 73 60329 Frankfurt/Main mail: olivier.dobber...@dkd.de web: http://www.dkd.de
RE: Indexing metadata in solr using ContentStreamUpdateRequest
Thanks, Grant. I resolved this issue by doing the following : For each of my own metadata fields, it is also required to define the mapping between tika field and solr field, either in solrconfig.xml or while submitting the request for indexing. Also, got to make sure that lowernames = false, in case the field names defined in schema.xml are in mixed case or upper case. And this solved the issue for me. Thanks, Sandhya -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Friday, April 30, 2010 4:15 PM To: solr-user@lucene.apache.org Subject: Re: Indexing metadata in solr using ContentStreamUpdateRequest What does your schema look like? On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote: Hello, I am using ContentStreamUpdateRequest, to index binary documents. At the time of indexing the content, I want to be able to index some additional metadata as well. I believe, this metadata must be provided, prefixed with *literal*. For instance, I have a field named “field1”, defined in schema.xml and to index a document with a value for this field, I would provide “literal.field1” = value. However, this does not seem to be working and the field defined in schema.xml, *field1*, does not have any data indexed. How can I get this working ? Thanks in advance. Thanks, Sandhya -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Problem with pdf, upgrading Cell
I did try standalone version of tika0.7, and it extracted pdf content successfully. Then i replaced tika related jars in contrib/extraction/lib of solr1.4 dist'n with their newer versions, and now it doesn;t extract contents from ANY pdf. Earlier (0.4) it was throwing exception for few pdfs, but now no contents or exception. On Fri, Apr 30, 2010 at 4:14 PM, Grant Ingersoll gsing...@apache.orgwrote: Can you share the PDF it is failing on? FWIW, PDFs are notoriously hard to extract. They come in all shapes and flavors and I've seen many a commercial extractor fail on them too. Have you tried using either Tika standalone or PDFBox standalone? Does the file work there? On Apr 26, 2010, at 8:35 AM, Marc Ghorayeb wrote: Okay i've been digging a little bit through the Java code from the SVN, and it seems the load function inside the ExtractingDocumentLoader class does not receive the ContentStream (it is set to null...).Maybe i should send this to the developper mailing list? Marc From: dekay...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 16:03:28 +0200 Seems like i'm not the only one with this no extraction problem: http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparentlyhe tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G. _ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail _ Découvrez comment SURFER DISCRETEMENT sur un site de rencontres ! http://clk.atdmt.com/FRM/go/206608211/direct/01/ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Solr date representation
Don't know if this counts as a bug report or not - it's certainly a corner case, but it's just bitten me. http://wiki.apache.org/solr/IndexingDates suggests that the canonical form of a date is a string like: 1995-12-31T23:59:59Z and says that this is a restricted form of the canonical representation of dateTime from XML Schema. The latter explicitly says '0001' is the lexical representation of the year 1 of the Common Era (1 CE, sometimes written AD 1 or 1 AD) However, if I put a document into Solr (1.4 release) with a datetime field of 0001-01-01T00:00:00Z then on retrieving that document, I get back the value 1-01-01T00:00:00Z (ie no preceding zeroes) - which tripped up my date-parsing routines. Preceding zeroes seem to be universally dropped - all dates before 1000AD seem to have the equivalent problem. Is this a bug in the code, or a bug in the documentation? Toby -- http://timetric.com 2nd Floor, White Bear Yard, 144a Clerkenwell Road, London EC1R 5DF twitter: @timetric, @tow21 | skype: tobyohwhite
Re: Problem with pdf, upgrading Cell
Hi Nope i didn't get it to work... Just like you, command line version of tika extracts correctly the content, but once included in Solr, no content is extracted. What i tried until now is:- Updating the tika libraries inside Solr 1.4 public version, no luck there.- Downloading the latest SVN version, compiled it, and started from a simple schema, still no luck.- Getting other versions compiled on hudson (nightly builds), and testing them also, still no extraction. I sent a mail on the developpers mailing list but they told me i should just mail here, hope some developper reads this because it's quite an important feature of Solr and somehow it got broke between the 1.4 release, and the last version on the svn. Marc _ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/
Re: Elevation of of part match
Gert, could you provide the solrconfig- and schema-specifications you have made? If the wiki really means what it says, the behaviour you want should be possible. But that's only what I guess. Btw: The standard definition for the elevation-component is string in the example-directory. That means that there is no tokinization and according to this a partially match is not possible. Hope that helps - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p767877.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ubuntu lucid package
what parts doesn't work for you? If there are bugs in the package it will be great if you can report them to make it better. On Fri, Apr 30, 2010 at 1:50 PM, Olivier Dobberkau olivier.dobber...@dkd.de wrote: Am 30.04.2010 um 09:24 schrieb Gora Mohanty: Also, the standard Debian/Ubuntu way of finding out what files a package installed is: dpkg -l pkg_name Regards, Gora You might try: # dpkg -L solr-common /. /etc /etc/solr /etc/solr/web.xml /etc/solr/conf /etc/solr/conf/admin-extra.html /etc/solr/conf/elevate.xml /etc/solr/conf/mapping-ISOLatin1Accent.txt /etc/solr/conf/protwords.txt /etc/solr/conf/schema.xml /etc/solr/conf/scripts.conf /etc/solr/conf/solrconfig.xml /etc/solr/conf/spellings.txt /etc/solr/conf/stopwords.txt /etc/solr/conf/synonyms.txt /etc/solr/conf/xslt /etc/solr/conf/xslt/example.xsl /etc/solr/conf/xslt/example_atom.xsl /etc/solr/conf/xslt/example_rss.xsl /etc/solr/conf/xslt/luke.xsl /usr /usr/share /usr/share/solr /usr/share/solr/WEB-INF /usr/share/solr/WEB-INF/lib /usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar /usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar /usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar /usr/share/solr/WEB-INF/weblogic.xml /usr/share/solr/scripts /usr/share/solr/scripts/abc /usr/share/solr/scripts/abo /usr/share/solr/scripts/backup /usr/share/solr/scripts/backupcleaner /usr/share/solr/scripts/commit /usr/share/solr/scripts/optimize /usr/share/solr/scripts/readercycle /usr/share/solr/scripts/rsyncd-disable /usr/share/solr/scripts/rsyncd-enable /usr/share/solr/scripts/rsyncd-start /usr/share/solr/scripts/rsyncd-stop /usr/share/solr/scripts/scripts-util /usr/share/solr/scripts/snapcleaner /usr/share/solr/scripts/snapinstaller /usr/share/solr/scripts/snappuller /usr/share/solr/scripts/snappuller-disable /usr/share/solr/scripts/snappuller-enable /usr/share/solr/scripts/snapshooter /usr/share/solr/admin /usr/share/solr/admin/_info.jsp /usr/share/solr/admin/action.jsp /usr/share/solr/admin/analysis.jsp /usr/share/solr/admin/analysis.xsl /usr/share/solr/admin/distributiondump.jsp /usr/share/solr/admin/favicon.ico /usr/share/solr/admin/form.jsp /usr/share/solr/admin/get-file.jsp /usr/share/solr/admin/get-properties.jsp /usr/share/solr/admin/header.jsp /usr/share/solr/admin/index.jsp /usr/share/solr/admin/jquery-1.2.3.min.js /usr/share/solr/admin/meta.xsl /usr/share/solr/admin/ping.jsp /usr/share/solr/admin/ping.xsl /usr/share/solr/admin/raw-schema.jsp /usr/share/solr/admin/registry.jsp /usr/share/solr/admin/registry.xsl /usr/share/solr/admin/replication /usr/share/solr/admin/replication/header.jsp /usr/share/solr/admin/replication/index.jsp /usr/share/solr/admin/schema.jsp /usr/share/solr/admin/solr-admin.css /usr/share/solr/admin/solr_small.png /usr/share/solr/admin/stats.jsp /usr/share/solr/admin/stats.xsl /usr/share/solr/admin/tabular.xsl /usr/share/solr/admin/threaddump.jsp /usr/share/solr/admin/threaddump.xsl /usr/share/solr/admin/debug.jsp /usr/share/solr/admin/dataimport.jsp /usr/share/solr/favicon.ico /usr/share/solr/index.jsp /usr/share/doc /usr/share/doc/solr-common /usr/share/doc/solr-common/changelog.Debian.gz /usr/share/doc/solr-common/README.Debian /usr/share/doc/solr-common/TODO.Debian /usr/share/doc/solr-common/copyright /usr/share/doc/solr-common/changelog.gz /usr/share/doc/solr-common/NOTICE.txt.gz /usr/share/doc/solr-common/README.txt.gz /var /var/lib /var/lib/solr /var/lib/solr/data /usr/share/solr/WEB-INF/lib/xml-apis.jar /usr/share/solr/WEB-INF/lib/xml-apis-ext.jar /usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar /usr/share/solr/WEB-INF/lib/slf4j-api.jar /usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar /usr/share/solr/WEB-INF/lib/lucene-snowball.jar /usr/share/solr/WEB-INF/lib/lucene-queries.jar /usr/share/solr/WEB-INF/lib/lucene-highlighter.jar /usr/share/solr/WEB-INF/lib/lucene-core.jar /usr/share/solr/WEB-INF/lib/lucene-analyzers.jar /usr/share/solr/WEB-INF/lib/jetty-util.jar /usr/share/solr/WEB-INF/lib/jetty.jar /usr/share/solr/WEB-INF/lib/commons-io.jar /usr/share/solr/WEB-INF/lib/commons-httpclient.jar /usr/share/solr/WEB-INF/lib/commons-fileupload.jar /usr/share/solr/WEB-INF/lib/commons-csv.jar /usr/share/solr/WEB-INF/lib/commons-codec.jar /usr/share/solr/WEB-INF/web.xml /usr/share/solr/conf If i reckon correctly some parts of apache solr will not work with the ubuntu lucid distribution. http://solr.dkd.local/update/extract throws an error: The server encountered an internal error (lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at Maybe someone from ubuntu reading this list can confirm this. Olivier -- Olivier Dobberkau d.k.d Internet Service GmbH Kaiserstraße 73 60329 Frankfurt/Main mail: olivier.dobber...@dkd.de web:
RE: benefits of float vs. string
When using numerical types you can do ranges like 3 myfield = 10 , as well as a lot of other interesting mathematical functions that would not be possible with a string type. Thanks for the info Yonik, -Kallin Nagelberg -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Friday, April 30, 2010 1:27 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: benefits of float vs. string Please explain a range query? tia :-) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 4/29/10, Yonik Seeley yo...@lucidimagination.com wrote: From: Yonik Seeley yo...@lucidimagination.com Subject: Re: benefits of float vs. string To: solr-user@lucene.apache.org Date: Thursday, April 29, 2010, 1:01 PM On Wed, Apr 28, 2010 at 11:22 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Does anyone have an idea about the performance benefits of searching across floats compared to strings? I have one multi-valued field that contains about 3000 distinct IDs across 5 million documents. I am going to be a lot of queries like q=id:102 OR id:303 OR id:305, etc. Right now it is a String but I am going to switch to a float as intuitively it ought to be easier to filter a number than a string. There won't be any difference in search speed for term queries as you show above. If you don't need to do sorting or range queries on that field, I'd leave it as a String. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague
prefixing with dismax
Hey, I've been using the dismax query parser so that I can pass a user created search string directly to Solr. Now I'm getting the requirement that something like 'Bo' must match 'Bob', or 'Bob Jo' must match 'Bob Jones'. I can't think of a way to make this happen with Dismax, though it's pretty simple with standard syntax. I guess I would just split on space and created ANDed terms like 'myfield:token*' . This doesn't feel like a great approach though, since I'm losing all of the escaping magic of Dismax. Does anyone have any cleaner solutions to this sort of problem? I imagine it's quite common. Thanks, Kallin Nagelberg
RE: Elevation of of part match
The elevate.xml-example says: !-- If this file is found in the config directory, it will only be loaded once at startup. If it is found in Solr's data directory, it will be re-loaded every commit. -- Did you make a restart? -- View this message in context: http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p768120.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with pdf, upgrading Cell
Praveen and Marc, Can you share the PDF (feel free to email my private email) that fails in Solr? Thanks, Grant On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: Hi Nope i didn't get it to work... Just like you, command line version of tika extracts correctly the content, but once included in Solr, no content is extracted. What i tried until now is:- Updating the tika libraries inside Solr 1.4 public version, no luck there.- Downloading the latest SVN version, compiled it, and started from a simple schema, still no luck.- Getting other versions compiled on hudson (nightly builds), and testing them also, still no extraction. I sent a mail on the developpers mailing list but they told me i should just mail here, hope some developper reads this because it's quite an important feature of Solr and somehow it got broke between the 1.4 release, and the last version on the svn. Marc _ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
How is DeletionPolicy supposed to work?
Hi folks, In moving to 1.4, it was unclear to me how deletionPolicy was supposed to work. I commit/optimize on a build server, then replicate to multiple search servers. I don't need anything fancy for a deletion policy: save one copy, and replicate on copy. But when I used no policy, sometimes the index would be twice the normal size. In an effort to eliminate that, I put in the explicit deletion below. But it STILL sometimes creates an index of double the size. This is causing space problems on some on my replicated servers. Can someone please explain what configuration I should apply to not ever save any extra commits or optimized commits, so that my index and all replicated copies of it will have a size of 1 index, rather than 2 indexes? A summary of the theory behind that would be most welcome too. Thanks! -Jim The deletion policy stanza from mainIndex in solrconfig.xml: deletionPolicy class=solr.SolrDeletionPolicy !-- The number of commit points to be kept -- str name=maxCommitsToKeep0/str !-- The number of optimized commit points to be kept -- str name=maxOptimizedCommitsToKeep1/str !-- Delete all commit points once they have reached the given age. Supports DateMathParser syntax e.g. str name=maxCommitAge30MINUTES/str str name=maxCommitAge1DAY/str -- /deletionPolicy
Re: Trouble with parenthesis
Pure negatives in lucene syntax don't match anything (solr currently only fixes this for you if it's a pure negative at the top-level, not embeded). Try changing (NOT periodicite:annuel) to (*:* NOT periodicite:annuel) But the second version below where you just removed the parens will be more efficient. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague On Fri, Apr 30, 2010 at 1:49 AM, mailing-list gboyr...@andevsol.com wrote: Hi everybody, We got a problem with parenthesis in a lucene/solr request (Solr 1.4) : - {!lucene q.op=AND}( ville:Moscou -periodicite:annuel) give 254documents with parsedquery+ville:Moscou -periodicite:annuel in debug mode. Thas'ts correct. - {!lucene q.op=AND} (ville:Moscou AND NOT periodicite:annuel) same results. - {!lucene q.op=AND} (ville:Moscou AND (NOT periodicite:annuel)) give 0 documents with parsedquery+ville:Moscou +(-periodicite:annuel) The 2 fields are standards string fields in the solr shema. Is it a issue or standard way of the Solr Query Parser ? Best regards. Gilbert Boyreau
RE: Elevation of of part match
Yes, I restarted. To make sure I just did it again. Same result; archive elevates, packet archive doesnt. G. From: MitchK [mailto:mitc...@web.de] Sent: Fri 4/30/2010 5:02 PM To: solr-user@lucene.apache.org Subject: RE: Elevation of of part match The elevate.xml-example says: !-- If this file is found in the config directory, it will only be loaded once at startup. If it is found in Solr's data directory, it will be re-loaded every commit. -- Did you make a restart? -- View this message in context: http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p768120.html Sent from the Solr - User mailing list archive at Nabble.com. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: How is DeletionPolicy supposed to work?
Simply use what the default was in the example solrconfig.xml... there is no need to modify that unless you are doing something advanced. In the config below, you show maxOptimizedCommitsToKeep=1, which will increase index size by always keeping around one optimized commit point. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague On Fri, Apr 30, 2010 at 11:40 AM, Paleo Tek paleo...@gmail.com wrote: Hi folks, In moving to 1.4, it was unclear to me how deletionPolicy was supposed to work. I commit/optimize on a build server, then replicate to multiple search servers. I don't need anything fancy for a deletion policy: save one copy, and replicate on copy. But when I used no policy, sometimes the index would be twice the normal size. In an effort to eliminate that, I put in the explicit deletion below. But it STILL sometimes creates an index of double the size. This is causing space problems on some on my replicated servers. Can someone please explain what configuration I should apply to not ever save any extra commits or optimized commits, so that my index and all replicated copies of it will have a size of 1 index, rather than 2 indexes? A summary of the theory behind that would be most welcome too. Thanks! -Jim The deletion policy stanza from mainIndex in solrconfig.xml: deletionPolicy class=solr.SolrDeletionPolicy !-- The number of commit points to be kept -- str name=maxCommitsToKeep0/str !-- The number of optimized commit points to be kept -- str name=maxOptimizedCommitsToKeep1/str !-- Delete all commit points once they have reached the given age. Supports DateMathParser syntax e.g. str name=maxCommitAge30MINUTES/str str name=maxCommitAge1DAY/str -- /deletionPolicy
Re: Problem with pdf, upgrading Cell
Grant, You can try any of the sample pdfs that come in /docs folder of Solr 1.4 dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only metadata i.e. stream_size, content_type apart from my own literals are indexed, and content is missing.. On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll gsing...@apache.orgwrote: Praveen and Marc, Can you share the PDF (feel free to email my private email) that fails in Solr? Thanks, Grant On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: Hi Nope i didn't get it to work... Just like you, command line version of tika extracts correctly the content, but once included in Solr, no content is extracted. What i tried until now is:- Updating the tika libraries inside Solr 1.4 public version, no luck there.- Downloading the latest SVN version, compiled it, and started from a simple schema, still no luck.- Getting other versions compiled on hudson (nightly builds), and testing them also, still no extraction. I sent a mail on the developpers mailing list but they told me i should just mail here, hope some developper reads this because it's quite an important feature of Solr and somehow it got broke between the 1.4 release, and the last version on the svn. Marc _ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: StreamingUpdateSolrServer hangs
On Thu, Apr 29, 2010 at 7:51 PM, Yonik Seeley yo...@lucidimagination.com wrote: I'm trying to reproduce now... single thread adding documents to a multithreaded client, StreamingUpdateSolrServer(addr,32,4) I'm currently at the 2.5 hour mark and 100M documents - no issues so far. I let it go to 500M docs... everything works fine (this is with the current trunk). -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague
Re: thresholding results by percentage drop from maxScore in lucene/solr
Mike, why don't order by the number of found items in your facet? If you get too many facets, just throw those away that got the smallest value, if you got not enough place for them. I suggest that, because you don't know every search-case. Sometimes the user does not really know what he is searching for or how to make his search more special and faceting helps him to navigate over a search-result. Just some thoughts. :-) - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/thresholding-results-by-percentage-drop-from-maxScore-in-lucene-solr-tp768872p768891.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Elevation of of part match
Sorry, as far as I did not make any experiences with the elevatorComponent, I can't help you with this. Even searching in the mailing list offers no usefull information. . . -- View this message in context: http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p768895.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom SolrQueryRequest/SolrQueryResponse
Solr team, Long time, first time-- many thanks for all your work on creating this excellent search appliance. The 40,000ft view of my problem is that I need to execute multiple queries per endpoint invocation, with the results for each query grouped in the response output as such that they were individual calls (think “composite” request response) wrapped by a composite tag, etc. So: Normal Query (single query input): results /results Composite Query (multiple query input): composite results/results results/results results/results composite I’ve already created a custom Handler and Writer for our “single”, non-composite needs, but now I need to modify the behavior so that if multiple search queries are specified (ie: q=query1;query2;query3 etc), the service will invoke and return all 3 result sets in a single invocation. Herein lies the problem from what I can tell: I don’t have any control over SolrQueryRequest or SolrQueryResponse. My initial attempts have me subclassing both of these to hold a List of requests and responses, with a cursor that moves the “current” req/res each time through my handler. All methods are implemented to delegate directly to the req/res that the cursor is pointing to. I would check, via instanceof, whether we are dealing with a normal or composite query in the writer to dump the results appropriately. To pull this off, it appears I would need to modify SolrDispatchFilter to allow for a configurable factory(?) for my custom SolrQueryRequest and SolrQueryResponse objects. Can this be solved some other way without code modifications? If code modifications are required, do you have any suggestions on how the configuration file entry might look, etc? I can write the patch but wanted to get your feedback before going any further with this. Thanks Aaron
Re: Solr date representation
: then on retrieving that document, I get back the value : : 1-01-01T00:00:00Z : : (ie no preceding zeroes) - which tripped up my date-parsing routines. : Preceding zeroes seem to be universally dropped - all dates before 1000AD seem : to have the equivalent problem. : : Is this a bug in the code, or a bug in the documentation? It's a bug in the code, thans for pointing this out... https://issues.apache.org/jira/browse/SOLR-1899 -Hoss