Re: Input raw log file
On Wed, Jan 12, 2011 at 11:50 AM, Dinesh mdineshkuma...@karunya.edu.in wrote: I have installed and tested the sample xml file and tried indexing.. everything went successful and when i tried with log files i got an error.. Please provide details of what you are doing, and of the error messages. How exactly are you sending the data files to Solr for indexing? Also, note that you will most likely need to change the default schema.xml. i tried reading the schema.xml and didn't get a clear idea.. can you please help.. It is very difficult to try to help you, given the scarce details that you provide. I would again suggest that you look for someone local to help you out. Alternatively, read carefully through the extensive documentation on the Solr Wiki, or get a copy of the Solr book: https://www.packtpub.com/solr-1-4-enterprise-search-server/book Regards, Gora
Re: Input raw log file
On Wed, Jan 12, 2011 at 12:10 PM, Dinesh mdineshkuma...@karunya.edu.in wrote: if i convert it to CSV or XML then it will be time consuming cause the indexing and getting data out of it should be real time.. is there any way i can do other than this.. if not what are the ways i can convert them to CSV and XML.. and lastly which is the doc folder of solr [...] What is real time for you? Conversion should be pretty fast. Also, you could use a FileDataSource, LineEntityProcessor, and a RegexTransformer to pick up data right from the text files. This is why I recommended this link to you originally: http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/ Regards, Goea
Re: Grouping - not sure where to handle (solr or outside of)
kmf, after a first read .. i would say, that sound's a bit like http://wiki.apache.org/solr/FieldCollapsing ? But that depends mainly on your current schema, take a look and let us know, if it helps :) Regards Stefan On Tue, Jan 11, 2011 at 8:06 PM, kmf kfole...@gmail.com wrote: I currently have a DIH that is working in terms of being able to search/filter on various facets, but I'm struggling to figure out how to take it to the next level of what I'd like ideally. We have a database where the atomic unit is a condition (like an environment description - temp, light, high salt, etc) and these conditions can be in groups. For example, conditionA may belong to groups huckleberry, star wars and some group. When I search/filter on a facet I'm currently able to see the conditions and the information about the conditions (like which group(s) it belongs to), but what I'm wanting to do is be able to return group names and their member conditions along with the conditions' respective info when I search/filter on a facet. So instead of seeing: - condtionA description: some description groups: huckleberry, star wars, some group I would like to see is: - huckleberry conditionA temp: 78light: 12hrs, NaCl: 35g/L condition35 control, temp: 65, NaCl: 25g/L - star wars conditionA temp: 78light: 12hrs, NaCl: 35g/L conditionDE temp: 78, light: 24hrs, NaCl: 0 Is this doable? My DIH has one entity that is conditions with all of its sub entities, would I need to change the DIH to achieve what I want to do? And/or do I need to configure the solrconfig and schema files to be able to do what I want to do? I realize that part of the problem is presentation which is not solr, but I'm struggling with figuring out how to transpose from condition to group in the index, if that makes sense? Assuming that's what I need to do. Or am I totally wrong in thinking I would handle this in the index? Thanks, kmf -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-not-sure-where-to-handle-solr-or-outside-of-tp2236108p2236108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Input raw log file
i got some idea like creating a DIH and then doing with that.. thanks every one for the help.. hope i'll create an regex DIH i guess that's right.. -- View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239947.html Sent from the Solr - User mailing list archive at Nabble.com.
other index input in Solr
Guten Morgen Solr-Users, In Deutschland ist es Morgen daher diese Begrüßung. Ich habe ein kleines Problem mit Solr. Ich habe einen Index, erstellt von einem anderen Programm, es ist ein Lucene Index und kann von Luke problemlos gelesen werden. Diesen möchte ich nun jedoch mit Solr durchsuchen lassen. Ich starte Solr ohne probleme und kann mir in der Adminoberfläche auch den index angucken, jedoch keine Suche darin starten, ich erhalte keine Antwort sondern nur Fehlermeldungen. Leider weiß ich nicht viel damit anzufangen, es scheint als wenn Solr sich über Leere Felder im Index Beschwert, ich weiß nur nicht wie ich das Ändere. EROOR: HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:619) at org.apache.solr.schema.StrField.write(StrField.java:46) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:307) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520) at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) -- *type* Status report *message* *null java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:619) at org.apache.solr.schema.StrField.write(StrField.java:46) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:307) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520) at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) * *description* *The server encountered an internal error (null java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:619) at org.apache.solr.schema.StrField.write(StrField.java:46) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at
Re: spell suggest response
satya, nice to hear, that it work's :) on your question to similar words: i would say no - suggestions are only generated based on available records, and afaik only if the given word/phrase is misspelled. Perhaps MoreLikeThis could help you, but not sure on this - especially because you're talking about single words and not similar documents :/ Stefan On Wed, Jan 12, 2011 at 6:14 AM, satya swaroop satya.yada...@gmail.comwrote: Hi Stefan, Ya it works :). Thanks... But i have a question... can it be done only getting spell suggestions even if the spelled word is correct... I mean near words to it... ex:- http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10 In the o/p the suggestions will not be coming as java is a word that spelt correctly... But cant we get near suggestions as javax,javacetc.., ??? Regards, satya
Regex DataImportHandler
Can anyone explain me how to create regex DataImportHandler.. -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-DataImportHandler-tp2240084p2240084.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler on Websphere 6.1 NullPointer Exception from SRTServletResponse.setContentType
Has anyone had any success using the DataImportHandler on Webshpere 6.1 I am getting the following exception from Websphere when viewing the DataImport Development Console in the browser. The ajax call to retrieve the dataconfig.xml fails. The thing is that if you do an import the import succeeds. [1/11/11 15:38:10:194 GMT] 0042 SolrDispatchF I org.apache.solr.servlet.SolrDispatchFilter init SolrDispatchFilter.init() done [1/11/11 15:38:10:381 GMT] 0042 SolrCore I org.apache.solr.core.SolrCore execute [] webapp=/solr path=/select params={command=show-configqt=/dataimport} status=0 QTime=47 [1/11/11 15:38:10:428 GMT] 0042 SolrDispatchF E org.apache.solr.common.SolrException log java.lang.NullPointerException at com.ibm.ws.webcontainer.srt.SRTServletResponse.setContentType(SRTServletResponse.java:1017) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:318) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) This may be a symptom of what's causing problems when my app tries to do a dataimport using SolrJ, so that is why I put the stack trace here. What's happening in my app is that SolrJ sends an Http request to the Solr instance to do a dataimport. The dataimport succeeds, but the response comes back as a 404 page not found. This causes SolrJ to throw an exception, and so the rest of my application fails and reports an error. When doing this call there is no stack trace in the logs, just an error saying page not found. The app works fine on JBoss but doesn't work on Websphere. The version of Solr is 1.4.1 Websphere is: version 6.1.0.0 Build Number: b0620.14 Build Date: 5/16/06 -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-Websphere-6-1-NullPointer-Exception-from-SRTServletResponse-setContentType-tp2240281p2240281.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex DataImportHandler
On Wed, Jan 12, 2011 at 3:07 PM, Dinesh mdineshkuma...@karunya.edu.in wrote: Can anyone explain me how to create regex DataImportHandler.. [...] Dear Dinesh, No offence, but please do some basic leg work on your own first, and then ask more specific questions. Did you read the Hathi trust blog that I have now referenced twice, and try out ideas from that? Alternatively, as also asked before, please post a short excerpt from your log files, indicating the parts of the data that you want to extract. Maybe someone can help you then. Regards, Gora
Re: spell suggest response
Hi stefan, I need the words from the index record itself. If java is given then the relevant or similar or near words in the index should be shown. Even the given keyword is true... can it be possible??? ex:- http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10 In the o/p the suggestions will not be coming as java is a word that spelt correctly... But cant we get near suggestions as javax,javacetc.., ???(the terms in the index) I read about suggester in solr wiki at http://wiki.apache.org/solr/Suggester . But i tried to implement it but got errors as *error loading class org.apache.solr.spelling.suggest.suggester* Regards, satya
Re: Regex DataImportHandler
ya i did.. i'm trying it.. still for a better solution i asked... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-DataImportHandler-tp2240084p2240295.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What can cause segment corruption?
Corruption should only happen if 1) we have a bug in Lucene (but we work hard to fix such bugs, though, LUCENE-2593, fixed in 2.9.4, is a recent case) or 2) there are hardware problems on the machine. Mike On Tue, Jan 11, 2011 at 10:02 AM, Stéphane Delprat stephane.delp...@blogspirit.com wrote: Thanks for your answer, It's not a disk space problem here : # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 280G 22G 244G 9% / We will try to install solr on a different server (We just need a little time for that) Stéphane Le 11/01/2011 15:42, Jason Rutherglen a écrit : Stéphane, I've only seen production index corruption when during merge the process ran out of disk space, or there is an underlying hardware related issue. On Tue, Jan 11, 2011 at 5:06 AM, Stéphane Delprat stephane.delp...@blogspirit.com wrote: Hi, I'm using Solr 1.4.1 (Lucene 2.9.3) And some segments get corrupted: 4 of 11: name=_p40 docCount=470035 compound=false hasProx=true numFiles=9 size (MB)=1,946.747 diagnostics = {optimize=true, mergeFactor=6, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_p40_bj.del] test: open reader.OK [9299 deleted docs] test: fields..OK [51 fields] test: field norms.OK [51 fields] test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 != num docs seen 0 + num docs deleted 0] java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs seen 0 + num docs deleted 0 at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) test: stored fields...OK [15454281 total field count; avg 33.543 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] FAILED WARNING: fixIndex() would remove reference to this segment; full exception: java.lang.RuntimeException: Term Index test failed at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) What might cause this corruption? I detailed my configuration here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c4d2ae506.7070...@blogspirit.com%3e Thanks,
Not storing, but highlighting from document sentences
Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to accomplish this elegantly? Or even not so elegantly? Here is one idea: * Create 2 indices: main index for indexing (but not storing) the original content, the secondary index for storing individual sentences from the original article. * That is, before indexing an article, split it into sentences. Then index the article in the main index, and index+store each sentence in the secondary index. So for each doc in the main index there will be multiple docs in the secondary index with individual sentences. Each sentence doc includes an ID of the parent document. * Then run queries against the main index, and pull individual sentences from the secondary index for snippet+highlight purposes. The problem I see with this approach (and there may be other ones that I am not seeing yet) is with queries like foo AND bar. In this case foo may be a match from sentence #1, and bar may be a match from sentence #7. Or maybe foo is a match in sentence #1, and bar is a match in multiple sentences: #7 and #10 and #23. Regardless, when a query is run against the main index, you don't know where the match was, so you don't know which sentences to go get from the secondary index. Does anyone have any suggestions for how to handle this? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
DataImportHandler on Websphere - http response 404
Has anyone had any success using the DataImportHandler on Webshpere 6.1 Below are the logs for a call to reload-config. I have turned on debug and stepped through the code and the dataImportHandler correctly reloads the config and the response gets written out to the http response without any errors being thrown from the Solr code. However, in Websphere the response is returned as a 404 page not found. So this is happening somewhere in the Websphere code. There are no errors reported in any of the Websphere logs. This all works fine on JBoss but doesn't work on Websphere. The version of Solr is 1.4.1 Websphere is: version 6.1.0.0 Build Number: b0620.14 Build Date: 5/16/06 This is the log file snippet. 12-Jan-2011 10:54:15,320 - - DEBUG header:70 - GET /solr/dataimport?optimize=trueclean=falsecommit=truecommand=reload-configqt=%2FdataimportomitHeader=truewt=javabinversion=1 HTTP/1.1[\r][\n] 12-Jan-2011 10:54:15,352 - - DEBUG header:70 - User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0[\r][\n] 12-Jan-2011 10:54:15,367 - - DEBUG header:70 - Host: 10.101.41.1:10012[\r][\n] 12-Jan-2011 10:54:15,398 - - DEBUG header:70 - [\r][\n] 12-Jan-2011 10:55:11,403 - - DEBUG header:70 - HTTP/1.1 404 Not Found[\r][\n] 12-Jan-2011 10:55:11,418 - - DEBUG header:70 - HTTP/1.1 404 Not Found[\r][\n] 12-Jan-2011 10:55:11,434 - - DEBUG header:70 - Last-Modified: Wed, 12 Jan 2011 10:54:15 GMT[\r][\n] 12-Jan-2011 10:55:11,465 - - DEBUG header:70 - ETag: 12d79dc9632[\r][\n] 12-Jan-2011 10:55:11,481 - - DEBUG header:70 - Cache-Control: no-cache, no-store[\r][\n] 12-Jan-2011 10:55:11,497 - - DEBUG header:70 - Pragma: no-cache[\r][\n] 12-Jan-2011 10:55:11,528 - - DEBUG header:70 - Expires: Sat, 01 Jan 2000 01:00:00 GMT[\r][\n] 12-Jan-2011 10:55:11,543 - - DEBUG header:70 - Content-Type: text/html;charset=ISO-8859-1[\r][\n] 12-Jan-2011 10:55:11,559 - - DEBUG header:70 - $WSEP: [\r][\n] 12-Jan-2011 10:55:11,575 - - DEBUG header:70 - Content-Language: en-US[\r][\n] 12-Jan-2011 10:55:11,606 - - DEBUG header:70 - Content-Length: 51[\r][\n] 12-Jan-2011 10:55:11,622 - - DEBUG header:70 - Connection: Close[\r][\n] 12-Jan-2011 10:55:11,637 - - DEBUG header:70 - Date: Wed, 12 Jan 2011 10:55:10 GMT[\r][\n] 12-Jan-2011 10:55:11,653 - - DEBUG header:70 - Server: WebSphere Application Server/6.1[\r][\n] 12-Jan-2011 10:55:11,684 - - DEBUG header:70 - [\r][\n] 12-Jan-2011 10:55:11,700 - - DEBUG content:70 - Error 404: SRVE0190E: File not found: /dataimport[\r][\n] 12-Jan-2011 10:55:11,715 - - ERROR SolrSearchEngine:422 - Failed to perform reload-config org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.norkom.search.business.engine.solr.SolrSearchEngine.executeDataImportCommand(SolrSearchEngine.java:415) at com.norkom.search.business.engine.solr.SolrSearchEngine.reloadDataImportConfig(SolrSearchEngine.java:374) at com.norkom.search.business.engine.solr.SolrSearchEngine.buildIndex(SolrSearchEngine.java:314) at com.norkom.search.business.jobs.FtsBuildIndexJob.executeJob(FtsBuildIndexJob.java:62) at com.norkom.base.business.jobs.ThreadedJob.run(ThreadedJob.java:52) at java.lang.Thread.run(Thread.java:797) Caused by: org.apache.solr.common.SolrException: Not Found Not Found -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-Websphere-http-response-404-tp2240440p2240440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Input raw log file
Dinesh, it will stay 'real time' even if you convert it. Converting should be done in the millisecond range if at all measureable (e.g. if you apply streaming). Beware: To use the real features you'll need the latest trunk of solr IMHO. I've done similar log-feeding stuff here (with code!): http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/ (not with a realtime solr!) You'll have to adapt the parser/matcher to fit your needs. Regards, Peter. if i convert it to CSV or XML then it will be time consuming cause the indexing and getting data out of it should be real time.. is there any way i can do other than this.. if not what are the ways i can convert them to CSV and XML.. and lastly which is the doc folder of solr -- http://jetwick.com open twitter search
Re: Not storing, but highlighting from document sentences
Otis, just interested in .. storing the full text is not allowed, but splitting up in separate sentences is okay? while you think about using the sentences only as secondary/additional source, maybe it would help to search in the sentences itself, or would that give misleading results in your case? Stefan On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to accomplish this elegantly? Or even not so elegantly? Here is one idea: * Create 2 indices: main index for indexing (but not storing) the original content, the secondary index for storing individual sentences from the original article. * That is, before indexing an article, split it into sentences. Then index the article in the main index, and index+store each sentence in the secondary index. So for each doc in the main index there will be multiple docs in the secondary index with individual sentences. Each sentence doc includes an ID of the parent document. * Then run queries against the main index, and pull individual sentences from the secondary index for snippet+highlight purposes. The problem I see with this approach (and there may be other ones that I am not seeing yet) is with queries like foo AND bar. In this case foo may be a match from sentence #1, and bar may be a match from sentence #7. Or maybe foo is a match in sentence #1, and bar is a match in multiple sentences: #7 and #10 and #23. Regardless, when a query is run against the main index, you don't know where the match was, so you don't know which sentences to go get from the secondary index. Does anyone have any suggestions for how to handle this? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: issue with the spatial search with solr
Hi Dennis, thanks a lot for pointing the problem. It works. On Tue, Jan 11, 2011 at 11:50 PM, Dennis Gearon gear...@sbcglobal.netwrote: You didn't happen to notice that you have one field names RestaurantLocation and another named RestaurantName, did you? You must be submitting 'RestaurantName', and it's being applied to a geo field. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: ur lops urlop...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, January 11, 2011 11:13:36 PM Subject: issue with the spatial search with solr Hi, I took the latest build from the hudson and installed on my computer. I have done the following changes in my schema.xml fieldType name=latLon class=solr.LatLonType subFieldSuffix=_latLon/ dynamicField name=*_latLon type=tdouble indexed=true stored=false/ field name=restaurantLocation type=latLonindexed=true stored=true/ When i run the query like this: HTTP ERROR 500 Problem accessing /solr/select. Reason: The field restaurantName does not support spatial filtering org.apache.solr.common.SolrException: The field restaurantName does not support spatial filtering at org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:86) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:112) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) This is my solr query: select?wt=jsonindent=truefl=name,storeq=*:*fq={!geofilt%20sfield=restaurantName}pt=45.15,-93.85d=5 Any help will be highly appreciated. Thanks
Re: Not storing, but highlighting from document sentences
Hi Stefan, Yes, splitting in separate sentences (and storing them) is OK because with a bunch of sentences you can't really reconstruct the original article unless you know which order to put them in. Searching against the sentence won't work for queries like foo AND bar because this should match original articles even if foo and bar are in different sentences. Otis - Original Message From: Stefan Matheis matheis.ste...@googlemail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 7:02:46 AM Subject: Re: Not storing, but highlighting from document sentences Otis, just interested in .. storing the full text is not allowed, but splitting up in separate sentences is okay? while you think about using the sentences only as secondary/additional source, maybe it would help to search in the sentences itself, or would that give misleading results in your case? Stefan On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to accomplish this elegantly? Or even not so elegantly? Here is one idea: * Create 2 indices: main index for indexing (but not storing) the original content, the secondary index for storing individual sentences from the original article. * That is, before indexing an article, split it into sentences. Then index the article in the main index, and index+store each sentence in the secondary index. So for each doc in the main index there will be multiple docs in the secondary index with individual sentences. Each sentence doc includes an ID of the parent document. * Then run queries against the main index, and pull individual sentences from the secondary index for snippet+highlight purposes. The problem I see with this approach (and there may be other ones that I am not seeing yet) is with queries like foo AND bar. In this case foo may be a match from sentence #1, and bar may be a match from sentence #7. Or maybe foo is a match in sentence #1, and bar is a match in multiple sentences: #7 and #10 and #23. Regardless, when a query is run against the main index, you don't know where the match was, so you don't know which sentences to go get from the secondary index. Does anyone have any suggestions for how to handle this? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: solr wildcard queries and analyzers
Have you made any progress? Since the AnalyzingQueryParser doesn't inherit from QParserPlugin solr doesn't want to use it but I guess we could implement a similar parser that does inherit from QParserPlugin? Switching parser seems to be what is needed? Has really no one solved this before? - Kári - Original Message - From: Matti Oinas matti.oi...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 11 January, 2011 12:47:52 PM Subject: Re: solr wildcard queries and analyzers This might be the solution. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html 2011/1/11 Matti Oinas matti.oi...@gmail.com: Sorry, the message was not meant to be sent here. We are struggling with the same problem here. 2011/1/11 Matti Oinas matti.oi...@gmail.com: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: FunctionQuery plugin propieties
Nevermind, I found it. You can add xml children to your plugin declaration in solrconfig.xml and then retrieve them by casting the namedList arguments received by your plugin at initialitzaion to SolrParams. On Tue, Jan 11, 2011 at 10:28 AM, dante stroe dante.st...@gmail.com wrote: Hi, Is there any way one can define proprieties for a function plugin extending the ValueSourceParser inside solrconfig.xml (as one can do with the defaults attribute for a query parser plugin inside the request handler)? Thanks, Dante
Can't find source or jar for Solr class JaspellTernarySearchTrie
Hi, I'm trying to find the source code for class: JaspellTernarySearchTrie. It's supposed to be used for spelling suggestions. It's referenced in the javadoc: http://lucene.apache.org/solr/api/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.html I realize this is a dumb question, but i've been looking through the downloads for several hours. I can't actually find the package org/apache/solr/spelling/suggest/ that it's supposed to be under. So if you would be so kind... What jar is it compiled into? Where is the source in the downloaded source tree? thanks.
RE: Not storing, but highlighting from document sentences
Hi Otis, I think you can get what you want by doing the first stage retrieval, and then in the second stage, add required constraint(s) to the query for the matching docid(s), and change the AND operators in the original query to OR. Coordination will cause the best snippet(s) to rise to the top, no? Hmm, you'll want to run the second stage once for each hit from the first stage, though, unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results... Steve -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, January 12, 2011 7:29 AM To: solr-user@lucene.apache.org Subject: Re: Not storing, but highlighting from document sentences Hi Stefan, Yes, splitting in separate sentences (and storing them) is OK because with a bunch of sentences you can't really reconstruct the original article unless you know which order to put them in. Searching against the sentence won't work for queries like foo AND bar because this should match original articles even if foo and bar are in different sentences. Otis - Original Message From: Stefan Matheis matheis.ste...@googlemail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 7:02:46 AM Subject: Re: Not storing, but highlighting from document sentences Otis, just interested in .. storing the full text is not allowed, but splitting up in separate sentences is okay? while you think about using the sentences only as secondary/additional source, maybe it would help to search in the sentences itself, or would that give misleading results in your case? Stefan On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to accomplish this elegantly? Or even not so elegantly? Here is one idea: * Create 2 indices: main index for indexing (but not storing) the original content, the secondary index for storing individual sentences from the original article. * That is, before indexing an article, split it into sentences. Then index the article in the main index, and index+store each sentence in the secondary index. So for each doc in the main index there will be multiple docs in the secondary index with individual sentences. Each sentence doc includes an ID of the parent document. * Then run queries against the main index, and pull individual sentences from the secondary index for snippet+highlight purposes. The problem I see with this approach (and there may be other ones that I am not seeing yet) is with queries like foo AND bar. In this case foo may be a match from sentence #1, and bar may be a match from sentence #7. Or maybe foo is a match in sentence #1, and bar is a match in multiple sentences: #7 and #10 and #23. Regardless, when a query is run against the main index, you don't know where the match was, so you don't know which sentences to go get from the secondary index. Does anyone have any suggestions for how to handle this? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: schema.xml in other than conf folder
Hi, These two links helped me to solve the problem. https://issues.apache.org/jira/browse/SOLR-1154 http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node Thanks, SRD -- View this message in context: http://lucene.472066.n3.nabble.com/schema-xml-in-other-than-conf-folder-tp2206587p2241266.html Sent from the Solr - User mailing list archive at Nabble.com.
Term frequency across multiple documents
I'm attempting to calculate term frequency across multiple documents in Solr. I've been able to use TermVectorComponent to get this data on a per-document basis but have been unable to find a way to do it for multiple documents -- that is, get a list of terms appearing in the documents and how many times each one appears. I'd also like to be able to filter the list of terms to be able to see how many times a specific term appears, though this is less important. Is there a way to do this in Solr? Aaron
Re: Solr trunk for production
Otis Gospodnetic wrote: Are people using Solr trunk in serious production environments? I suspect the answer is yes, just want to see if there are any gotchas/warnings. Yes, since it seemed the best way to get edismax with this patch[1]; and to get the more update-friendly MergePolicy[2]. Main gotcha I noticed so far is trying to figure out appropriate times to sync with trunk's newer patches; and whether or not we need to rebuild our kinda big ( 1TB) indexes when we do. [1] the patch I needed: https://issues.apache.org/jira/browse/SOLR-2058 [2] nicer MergePolicy https://issues.apache.org/jira/browse/LUCENE-2602
Re: Resolve a DataImportHandler datasource based on previous entity
Hi Gora, Unfortunately reorganizing the data is not an option for me. Multiple databases exist and a third party is taking care of populating them. Once a database reaches a certain size, a switch occurs and a new database is created with the same table structure. Gora Mohanty-3 wrote: I meant a script that runs the query that defines the datasources for all fields, writes a Solr DIH configuration file, and then initiates a dataimport. Ok, so the query would select only the articles for which the data is sitting in a specific datasource. Then, only that one datasource would be indexed. For each additional datasource would the script initiate another full-import with the clean attribute set to false? I tried to make some changes to DIH that comes with Solr 1.4.1 The getResolvedEntityAttribute(dataSource); method seems to so the trick. Here is the modified code. It feels awkward but it seems to work. org.apache.solr.handler.dataimport.ContextImpl public DataSource getDataSource() { if (ds != null) return ds; if(entity == null) return null; String dataSourceResolved = this.getResolvedEntityAttribute(dataSource); if (entity.dataSrc == null) { entity.dataSrc = dataImporter.getDataSourceInstance(entity, dataSourceResolved, this); entity.dataSource = dataSourceResolved; } else if (!dataSourceResolved.equals(entity.dataSource)) { entity.dataSrc.close(); entity.dataSrc = dataImporter.getDataSourceInstance(entity, dataSourceResolved, this); entity.dataSource = dataSourceResolved; } if (entity.dataSrc != null docBuilder != null docBuilder.verboseDebug Context.FULL_DUMP.equals(currentProcess())) { //debug is not yet implemented properly for deltas entity.dataSrc = docBuilder.writer.getDebugLogger().wrapDs(entity.dataSrc); } return entity.dataSrc; } I hope I am not breaking any other functionality... Would it be possible to add something like this to a future release? Regards, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2241653.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr trunk for production
What's the syntax for spatial for that version of Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Ron Mayer r...@0ape.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 7:18:10 AM Subject: Re: Solr trunk for production Otis Gospodnetic wrote: Are people using Solr trunk in serious production environments? I suspect the answer is yes, just want to see if there are any gotchas/warnings. Yes, since it seemed the best way to get edismax with this patch[1]; and to get the more update-friendly MergePolicy[2]. Main gotcha I noticed so far is trying to figure out appropriate times to sync with trunk's newer patches; and whether or not we need to rebuild our kinda big ( 1TB) indexes when we do. [1] the patch I needed: https://issues.apache.org/jira/browse/SOLR-2058 [2] nicer MergePolicy https://issues.apache.org/jira/browse/LUCENE-2602
Re: segment gets corrupted (after background merge ?)
I got another corruption. It sure looks like it's the same type of error. (on a different field) It's also not linked to a merge, since the segment size did not change. *** good segment : 1 of 9: name=_ncc docCount=1841685 compound=false hasProx=true numFiles=9 size (MB)=6,683.447 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0 _20, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_ncc_22s.del] test: open reader.OK [275881 deleted docs] test: fields..OK [51 fields] test: field norms.OK [51 fields] test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs; 204561440 tokens] test: stored fields...OK [45511958 total field count; avg 29.066 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] a few hours latter : *** broken segment : 1 of 17: name=_ncc docCount=1841685 compound=false hasProx=true numFiles=9 size (MB)=6,683.447 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0 _20, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_ncc_24f.del] test: open reader.OK [278167 deleted docs] test: fields..OK [51 fields] test: field norms.OK [51 fields] test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num docs seen 0 + num docs deleted 0] java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs seen 0 + num docs deleted 0 at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) test: stored fields...OK [45429565 total field count; avg 29.056 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] FAILED WARNING: fixIndex() would remove reference to this segment; full exception: java.lang.RuntimeException: Term Index test failed at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) I'll activate infoStream for next time. Thanks, Le 12/01/2011 00:49, Michael McCandless a écrit : When you hit corruption is it always this same problem?: java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs seen 0 + num docs deleted 0 Can you run with Lucene's IndexWriter infoStream turned on, and catch the output leading to the corruption? If something is somehow messing up the bits in the deletes file that could cause this. Mike On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat stephane.delp...@blogspirit.com wrote: Hi, We are using : Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 # java -version java version 1.6.0_20 Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) We want to index 4M docs in one core (and when it works fine we will add other cores with 2M on the same server) (1 doc ~= 1kB) We use SOLR replication every 5 minutes to update the slave server (queries are executed on the slave only) Documents are changing very quickly, during a normal day we will have approx : * 200 000 updated docs * 1000 new docs * 200 deleted docs I attached the last good checkIndex : solr20110107.txt And the corrupted one : solr20110110.txt This is not the first time a segment gets corrupted on this server, that's why I ran frequent checkIndex. (but as you can see the first segment is 1.800.000 docs and it works fine!) I can't find any SEVER FATAL or exception in the Solr logs. I also attached my schema.xml and solrconfig.xml Is there something wrong with what we are doing ? Do you need other info ? Thanks,
Re: Tuning StatsComponent
i try this: http://host:port /solr/select?q=YOUR_QUERYstats=onstats.field=amountf.amount.stats.facet=currencyrows=0 and this: http://host:portsolr /select?q=amount_us:*+OR+amount_eur:*[+OR+amount_...:*]stats=onstats.field=amount_usdstats.field=amount_eur[stats.field=amount_...]rows=0 of my index. but however i change my request, every request have a Qtime of ~10 seconds ... my result solr StatsComponent cannot be fast on 31 Million documents =( -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-tp2225809p2241793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: segment gets corrupted (after background merge ?)
Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted 0? It looks like new deletions were flushed against the segment (del file changed from _ncc_22s.del to _ncc_24f.del). Are you hitting any exceptions during indexing? Mike On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat stephane.delp...@blogspirit.com wrote: I got another corruption. It sure looks like it's the same type of error. (on a different field) It's also not linked to a merge, since the segment size did not change. *** good segment : 1 of 9: name=_ncc docCount=1841685 compound=false hasProx=true numFiles=9 size (MB)=6,683.447 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0 _20, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_ncc_22s.del] test: open reader.OK [275881 deleted docs] test: fields..OK [51 fields] test: field norms.OK [51 fields] test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs; 204561440 tokens] test: stored fields...OK [45511958 total field count; avg 29.066 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] a few hours latter : *** broken segment : 1 of 17: name=_ncc docCount=1841685 compound=false hasProx=true numFiles=9 size (MB)=6,683.447 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0 _20, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_ncc_24f.del] test: open reader.OK [278167 deleted docs] test: fields..OK [51 fields] test: field norms.OK [51 fields] test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num docs seen 0 + num docs deleted 0] java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs seen 0 + num docs deleted 0 at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) test: stored fields...OK [45429565 total field count; avg 29.056 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] FAILED WARNING: fixIndex() would remove reference to this segment; full exception: java.lang.RuntimeException: Term Index test failed at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) I'll activate infoStream for next time. Thanks, Le 12/01/2011 00:49, Michael McCandless a écrit : When you hit corruption is it always this same problem?: java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs seen 0 + num docs deleted 0 Can you run with Lucene's IndexWriter infoStream turned on, and catch the output leading to the corruption? If something is somehow messing up the bits in the deletes file that could cause this. Mike On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat stephane.delp...@blogspirit.com wrote: Hi, We are using : Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 # java -version java version 1.6.0_20 Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) We want to index 4M docs in one core (and when it works fine we will add other cores with 2M on the same server) (1 doc ~= 1kB) We use SOLR replication every 5 minutes to update the slave server (queries are executed on the slave only) Documents are changing very quickly, during a normal day we will have approx : * 200 000 updated docs * 1000 new docs * 200 deleted docs I attached the last good checkIndex : solr20110107.txt And the corrupted one : solr20110110.txt This is not the first time a segment gets corrupted on this server, that's why I ran frequent checkIndex. (but as you can see the first segment is 1.800.000 docs and it works fine!) I can't find any SEVER FATAL or exception in the Solr logs. I also attached my schema.xml and solrconfig.xml Is there something wrong with what we are doing ? Do you need other info ? Thanks,
Re: Tuning StatsComponent
my field Type is double maybe sint is better ? but i need double ... =( -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-tp2225809p2241903.html Sent from the Solr - User mailing list archive at Nabble.com.
Where does admin UI visually distinguish between master and slave?
Hi all, I'm getting started with a master/slave configuration for two solr instances. Two distinguish between 'master' and 'slave', I've set he system properties (e.g. -Dmaster.enabled) and using the same 'solrconfig.xml'. I can see via the system properties admin UI that the jvm (and thus solr) sees correct values, i.e.: enable.master = false enable.slave = true However, the replication admin UI is identical for both 'master' and 'slave'. (i.e. http://localhost:8983/solr/production/admin/replication/index.jsp) I'd like a clearer visual confirmation that the master node is indeed a master and the slave is a slave. Summary question: Does the admin UI distinguish betwen master and slave? thanks will
Re: Not storing, but highlighting from document sentences
Hi Steve, - Original Message From: Steven A Rowe sar...@syr.edu Subject: RE: Not storing, but highlighting from document sentences I think you can get what you want by doing the first stage retrieval, and then in the second stage, add required constraint(s) to the query for the matching docid(s), and change the AND operators in the original query to OR. Coordination will cause the best snippet(s) to rise to the top, no? Right, right. So if the original query is: foo AND bar, I'd run it against the main index, get top N hits, say N=10. Then I'd create another query: +(foo OR bar) +articleID:(ORed list of top N article IDs from main results) And then I'd use that to get enough sentence docs to have at least 1 of them for each hit from the main index. Hm, I wonder what happens when instead of simple foo AND bar you have a more complex query with more elaborate grouping and such... Hmm, you'll want to run the second stage once for each hit from the first stage, though, unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results... Wouldn't the above get me all sentences I need for top N hits from the main result in a single shot, assuming I use high enough rows=NNN to minimize the possibility of not getting even 1 sentence for any one of those top N hits? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ Steve -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, January 12, 2011 7:29 AM To: solr-user@lucene.apache.org Subject: Re: Not storing, but highlighting from document sentences Hi Stefan, Yes, splitting in separate sentences (and storing them) is OK because with a bunch of sentences you can't really reconstruct the original article unless you know which order to put them in. Searching against the sentence won't work for queries like foo AND bar because this should match original articles even if foo and bar are in different sentences. Otis - Original Message From: Stefan Matheis matheis.ste...@googlemail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 7:02:46 AM Subject: Re: Not storing, but highlighting from document sentences Otis, just interested in .. storing the full text is not allowed, but splitting up in separate sentences is okay? while you think about using the sentences only as secondary/additional source, maybe it would help to search in the sentences itself, or would that give misleading results in your case? Stefan On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to accomplish this elegantly? Or even not so elegantly? Here is one idea: * Create 2 indices: main index for indexing (but not storing) the original content, the secondary index for storing individual sentences from the original article. * That is, before indexing an article, split it into sentences. Then index the article in the main index, and index+store each sentence in the secondary index. So for each doc in the main index there will be multiple docs in the secondary index with individual sentences. Each sentence doc includes an ID of the parent document. * Then run queries against the main index, and pull individual sentences from the secondary index for snippet+highlight purposes. The problem I see with this approach (and there may be other ones that I am not seeing yet) is with queries like foo AND bar. In this case foo may be a match from sentence #1, and bar may be a match from sentence #7. Or maybe foo is a match in sentence #1, and bar is a match in multiple sentences: #7 and #10 and #23. Regardless, when a query is run against the main index, you don't know where the match was, so you don't know which sentences to go get from the secondary index. Does anyone have any suggestions for how to handle this? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Multiple Solr instances common core possible ?
That's correct. Only 1 instance should be writing. You should be able to point multiple Solr read-only instances to the same physical read-only index. I don't recall trying this recently, though. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Tue, January 11, 2011 12:29:54 PM Subject: Re: Multiple Solr instances common core possible ? NOT sure about any of it, but THINK that READ ONLY, with one solr instance doing writes is possible. I've heard that it's NEVER possible to do multiple Solr Instances writing. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Ravi Kiran ravi.bhas...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, January 11, 2011 9:15:06 AM Subject: Multiple Solr instances common core possible ? Hello, Is it possible to deploy multiple solr instances with different context roots pointing to the same solr core ? If I do this will there be any deadlocks or file handle issues ? The reason I need this setup is because I want to expose solr to an third party vendor via a different context root. My solr instance is deployed on Glassfish. Alternately, if there is a configurable way to setup multiple context roots for the same solr instance that will suffice at this point of time. Ravi Kiran
Re: DataImportHandler on Websphere - http response 404
I have found a workaround for this. 1. change the entry in solrconfig.xml for the DataImportHandler by removing the slash from the name, like this requestHandler name=dataimport... 2. when making the request to the SolrJ server, don't use a slash in the qt parameter, i.e. solrParameters.set(qt, dataimport); If you use slashes, the url generated by SolrJ will be like '/solr/dataimport...qt=%2Fdataimport' Removing the slashes will change the url to something like '/solr/select...qt=dataimport' (Solr will use the 'qt' parameter to find the right handler). The resulting url will be something like this: /solr/select?optimize=trueclean=falsecommit=truecommand=reload-configqt=dataimportwt=javabinversion=1 My guess is that '/select' is mapped in the web.xml of Solr to a servlet, whereas '/dataimport' is not and that Websphere will complain about that, whereas JBoss doesn't care. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-Websphere-http-response-404-tp2240440p2242162.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: default RegexFragmenter
Sebastian, If I remember my regular expressions, that - and / are really just that. The stuff inside angle brackets means any of the characters between [ and ]. - and / are just two of those characters, along with newline, space, comma, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Sebastian M mihais...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, January 11, 2011 11:22:01 AM Subject: default RegexFragmenter Hello, I'm investigating an issue where spellcheck queries are tokenized without being explicitly told to do so, resulting in suggestions such as www.www.product4sale.com.com for the queries such as www.product4sale.com. The default RegexFragmenter fragmenter (name=regex) uses the regular expression: [-\w ,/\n\']{20,200} I understand parts of it, but I'm not sure about the - sign, or the slash midway through it. I would like to perhaps tailor this regular expression to not cause query terms such as www.product4sale.com to be broken down on the period marks, but just be kept as they are. Any suggestions or answers are highly appreciated! Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Where does admin UI visually distinguish between master and slave?
Hi Will, I don't think we have a clean master or slave label anywhere in the Admin UI. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Will Milspec will.mils...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 11:18:17 AM Subject: Where does admin UI visually distinguish between master and slave? Hi all, I'm getting started with a master/slave configuration for two solr instances. Two distinguish between 'master' and 'slave', I've set he system properties (e.g. -Dmaster.enabled) and using the same 'solrconfig.xml'. I can see via the system properties admin UI that the jvm (and thus solr) sees correct values, i.e.: enable.master = false enable.slave = true However, the replication admin UI is identical for both 'master' and 'slave'. (i.e. http://localhost:8983/solr/production/admin/replication/index.jsp) I'd like a clearer visual confirmation that the master node is indeed a master and the slave is a slave. Summary question: Does the admin UI distinguish betwen master and slave? thanks will
Re: Where does admin UI visually distinguish between master and slave?
Well, slaves to show different things in the replication.jsp page. Master http://10cc:8080/solr/replication Poll Interval 00:00:10 Local Index Index Version: 1294666552434, Generation: 2515 Location: /var/lib/solr/data/index Size: 4.65 GB Times Replicated Since Startup: 934 Where master nodes (or slaves where enabled=false) show: Local Index Index Version: 1294666552449, Generation: 2530 Location: /var/lib/solr/data/index Size: 4.65 GB On Wednesday 12 January 2011 17:24:57 Otis Gospodnetic wrote: Hi Will, I don't think we have a clean master or slave label anywhere in the Admin UI. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Will Milspec will.mils...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 11:18:17 AM Subject: Where does admin UI visually distinguish between master and slave? Hi all, I'm getting started with a master/slave configuration for two solr instances. Two distinguish between 'master' and 'slave', I've set he system properties (e.g. -Dmaster.enabled) and using the same 'solrconfig.xml'. I can see via the system properties admin UI that the jvm (and thus solr) sees correct values, i.e.: enable.master = false enable.slave = true However, the replication admin UI is identical for both 'master' and 'slave'. (i.e. http://localhost:8983/solr/production/admin/replication/index.jsp) I'd like a clearer visual confirmation that the master node is indeed a master and the slave is a slave. Summary question: Does the admin UI distinguish betwen master and slave? thanks will -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: icq or other 'instant gratification' communication forums for Solr
Dennis, Join #solr on Freenode. But it's not necessarily any livelier than this ML. It depends who's actively on. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Tue, January 11, 2011 12:09:22 AM Subject: icq or other 'instant gratification' communication forums for Solr Are there any chatrooms or ICQ rooms to ask questions late at night to people who stay up or are on other side of planet? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: spell suggest response
It isn't exactly what you want, but did you try with the onlyMorePopular parameter? http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular Regards, Juan Grande On Wed, Jan 12, 2011 at 7:29 AM, satya swaroop satya.yada...@gmail.comwrote: Hi stefan, I need the words from the index record itself. If java is given then the relevant or similar or near words in the index should be shown. Even the given keyword is true... can it be possible??? ex:- http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10 In the o/p the suggestions will not be coming as java is a word that spelt correctly... But cant we get near suggestions as javax,javacetc.., ???(the terms in the index) I read about suggester in solr wiki at http://wiki.apache.org/solr/Suggester . But i tried to implement it but got errors as *error loading class org.apache.solr.spelling.suggest.suggester* Regards, satya
Re: pruning search result with search score gradient
What's the use-case you're trying to solve? Because if you're still showing results to the user, you're taking information away from them. Where are you expecting to get the list? If you try to return the entire list, you're going to pay the penalty of creating the entire list and transmitting it across the wire rather than just a pages' worth. And if you're paging, the user will do this for you by deciding for herself when she's getting less relevant results. So I don't understand what the value to the end user you're trying to provide is, perhaps if you elaborate on that I'll have more useful response Best Erick On Tue, Jan 11, 2011 at 3:12 AM, Julien Piquot julien.piq...@arisem.comwrote: Hi everyone, I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. The threshold value would still have to be tuned but I believe it would make a much stronger metric than an absolute search score. What do you think about this approach? Do you see any problem with it? Is there any SOLR tools that could help me dealing with that? Thanks for your answer. Julien
Re: Not storing, but highlighting from document sentences
Hi Steven, if I understand correctly, you are suggesting query execution in two phases: first execute query on whole article index core (where whole articles are indexed, but not stored) to get article IDs (for articles which match original query). Then for each match in article core: change the AND operators from the original query to OR and add articleID condition/filter and execute such query on sentence based index (with assumption each sentence based doc has articleID set). Is this correct and it this what is you'll want to run the second stage once for each hit from the first stage, though referring to? Example for this scenario would be for original query q=apples and oranges, execute q=apples and orange with fl=articleId on article core and for each articleIdX result execute q=(apples OR orange) AND articleId:articleIdX on sentence based core. Same thing (with the same results) should be doable with only a single query in second phase, for previous example that single query for second phase would be for all articleId1,...,articleIdN something like: q=((apples OR orange) AND articleId:articleId1) OR ((apples OR orange) AND articleId:articleId2) OR ... OR apples OR orange) AND articleId:articleIdN) But, here in second case results are ordered by sentence scoring instead of article and reslts should be re-ordered. Is this what is unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results refering to? My actual question after this really long intro is: couldn't this be done with single second level query approach, but on each topN start/row chunk as user iterates through first level results? For example, user executes query q=apples and oranges and this results in 1000 results, but first page display only for example 20 results which means proposed solution would: 1. phase: execute execute q=apples and orange with fl=articleId on article core, but with start=0rows=20 2. phase: q=((apples OR orange) AND articleId:articleId1) OR ((apples OR orange) AND articleId:articleId2) OR ... OR apples OR orange) AND articleId:articleId20) 3. Reorder sentence results to match order defined by article matching scores and return to user Only, the results here would need to be collapsed on unique articleID, so only 20 results are provided in result set (because multiple sentence based doc can be returned for a single unique articleID) Would this work? Thanks, Tomislav 2011/1/12 Steven A Rowe sar...@syr.edu: Hi Otis, I think you can get what you want by doing the first stage retrieval, and then in the second stage, add required constraint(s) to the query for the matching docid(s), and change the AND operators in the original query to OR. Coordination will cause the best snippet(s) to rise to the top, no? Hmm, you'll want to run the second stage once for each hit from the first stage, though, unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results... Steve -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, January 12, 2011 7:29 AM To: solr-user@lucene.apache.org Subject: Re: Not storing, but highlighting from document sentences Hi Stefan, Yes, splitting in separate sentences (and storing them) is OK because with a bunch of sentences you can't really reconstruct the original article unless you know which order to put them in. Searching against the sentence won't work for queries like foo AND bar because this should match original articles even if foo and bar are in different sentences. Otis - Original Message From: Stefan Matheis matheis.ste...@googlemail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 7:02:46 AM Subject: Re: Not storing, but highlighting from document sentences Otis, just interested in .. storing the full text is not allowed, but splitting up in separate sentences is okay? while you think about using the sentences only as secondary/additional source, maybe it would help to search in the sentences itself, or would that give misleading results in your case? Stefan On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to accomplish this elegantly? Or even not so elegantly? Here is one idea: * Create 2 indices: main index for indexing (but not storing) the original content, the secondary index for storing individual sentences from the original article. * That is, before indexing an article, split it into sentences. Then index the article in the main index, and index+store
Re: Term frequency across multiple documents
Maybe there is a better solution, but I think that you can solve this problem using facets. You will get the number of documents where each term appears. Also, you can filter a specific set of terms by entering a query like +field:term1 OR +field:term2 OR ..., or using the facet.query parameter. Regards, Juan Grande On Wed, Jan 12, 2011 at 11:08 AM, Aaron Bycoffe abyco...@sunlightfoundation.com wrote: I'm attempting to calculate term frequency across multiple documents in Solr. I've been able to use TermVectorComponent to get this data on a per-document basis but have been unable to find a way to do it for multiple documents -- that is, get a list of terms appearing in the documents and how many times each one appears. I'd also like to be able to filter the list of terms to be able to see how many times a specific term appears, though this is less important. Is there a way to do this in Solr? Aaron
Re: pruning search result with search score gradient
Some times I've _considered_ trying to do this (but generally decided it wasn't worth it) was when I didn't want those documents below the threshold to show up in the facet values. In my application the facet counts are sometimes very pertinent information, that are sometimes not quite as useful as they could be when they include barely-relevant hits. On 1/12/2011 11:42 AM, Erick Erickson wrote: What's the use-case you're trying to solve? Because if you're still showing results to the user, you're taking information away from them. Where are you expecting to get the list? If you try to return the entire list, you're going to pay the penalty of creating the entire list and transmitting it across the wire rather than just a pages' worth. And if you're paging, the user will do this for you by deciding for herself when she's getting less relevant results. So I don't understand what the value to the end user you're trying to provide is, perhaps if you elaborate on that I'll have more useful response Best Erick On Tue, Jan 11, 2011 at 3:12 AM, Julien Piquotjulien.piq...@arisem.comwrote: Hi everyone, I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. The threshold value would still have to be tuned but I believe it would make a much stronger metric than an absolute search score. What do you think about this approach? Do you see any problem with it? Is there any SOLR tools that could help me dealing with that? Thanks for your answer. Julien
RE: Not storing, but highlighting from document sentences
I think you can get what you want by doing the first stage retrieval, and then in the second stage, add required constraint(s) to the query for the matching docid(s), and change the AND operators in the original query to OR. Coordination will cause the best snippet(s) to rise to the top, no? Right, right. So if the original query is: foo AND bar, I'd run it against the main index, get top N hits, say N=10. Then I'd create another query: +(foo OR bar) +articleID:(ORed list of top N article IDs from main results) And then I'd use that to get enough sentence docs to have at least 1 of them for each hit from the main index. Hm, I wonder what happens when instead of simple foo AND bar you have a more complex query with more elaborate grouping and such... :) I was hoping that you could limit the query language to exclude grouping... If not, you could walk the boolean query, trim all clauses that are PROHIBITED, then flatten all of the remaining terms to a single OR'd query? Hmm, you'll want to run the second stage once for each hit from the first stage, though, unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results... Wouldn't the above get me all sentences I need for top N hits from the main result in a single shot, assuming I use high enough rows=NNN to minimize the possibility of not getting even 1 sentence for any one of those top N hits? Yes, but the problem is that the worst case is that you have to retrieve *all* second-stage hits to get at least one for each of the first-stage hits. So if you're okay with NNN = numDocs, then no problem. Steve
Re: DIH - Closing ResultSet in JdbcDataSource
I have found where a root entity has completed processing and added the logic to clear the entity's cache at that point (didn't change any of the logic for clearing all entity caches once the import has completed). I have also created an enhancement request found at https://issues.apache.org/jira/browse/SOLR-2313. On Tue, Jan 11, 2011 at 2:54 PM, Shane Perry thry...@gmail.com wrote: By placing some strategic debug messages, I have found that the JDBC connections are not being closed until all entity elements have been processed (in the entire config file). A simplified example would be: dataConfig dataSource name=ds1 driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/db1 user=... password=... / dataSource name=ds2 driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/db2 user=... password=... / document entity name=entity1 datasource=ds1 ... ... field list ... entity name=entity1a datasource=ds1 ... ... field list ... /entity /entity entity name=entity2 datasource=ds2 ... ... field list ... entity name=entity2a datasource=ds2 ... ... field list ... /entity /entity /document /dataConfig The behavior is: JDBC connection opened for entity1 and entity1a - Applicable queries run and ResultSet objects processed All open ResultSet and Statement objects closed for entity1 and entity1a JDBC connection opened for entity2 and entity2a - Applicable queries run and ResultSet objects processed All open ResultSet and Statement objects closed for entity2 and entity2a All JDBC connections (none are closed at this point) are closed. In my instance, I have some 95 unique entity elements (19 parents with 5 children each), resulting in 95 open JDBC connections. If I understand the process correctly, it should be safe to close the JDBC connection for a root entity (immediate children of document) and all descendant entity elements once the parent has been successfully completed. I have been digging around the code, but due to my unfamiliarity with the code, I'm not sure where this would occur. Is this a valid solution? It's looking like I should probably open a defect and I'm willing to do so along with submitting a patch, but need a little more direction on where the fix would best reside. Thanks, Shane On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote: Gora, Thanks for the response. After taking another look, you are correct about the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0). I didn't recognize the case difference in the two function calls, so missed it. I'll keep looking into the original issue and reply if I find a cause/solution. Shane On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote: Hi, I am in the process of migrating our system from Postgres 8.4 to Solr 1.4.1. Our system is fairly complex and as a result, I have had to define 19 base entities in the data-config.xml definition file. Each of these entities executes 5 queries. When doing a full-import, as each entity completes, the server hosting Postgres shows 5 idle in transaction for the entity. In digging through the code, I found that the JdbcDataSource wraps the ResultSet object in a custom ResultSetIterator object, leaving the ResultSet open. Walking through the code I can't find a close() call anywhere on the ResultSet. I believe this results in the idle in transaction processes. [...] Have not examined the idle in transaction issue that you mention, but the ResultSet object in a ResultSetIterator is closed in the private hasnext() method, when there are no more results, or if there is an exception. hasnext() is called by the public hasNext() method that should be used in iterating over the results, so I see no issue there. Regards, Gora P.S. This is from Solr 1.4.0 code, but I would not think that this part of the code would have changed.
RE: Not storing, but highlighting from document sentences
Hi Tomislav, if I understand correctly, you are suggesting query execution in two phases: first execute query on whole article index core (where whole articles are indexed, but not stored) to get article IDs (for articles which match original query). Then for each match in article core: change the AND operators from the original query to OR and add articleID condition/filter and execute such query on sentence based index (with assumption each sentence based doc has articleID set). Yes. Is this correct and it this what is you'll want to run the second stage once for each hit from the first stage, though referring to? Example for this scenario would be for original query q=apples and oranges, execute q=apples and orange with fl=articleId on article core and for each articleIdX result execute q=(apples OR orange) AND articleId:articleIdX on sentence based core. Same thing (with the same results) should be doable with only a single query in second phase, for previous example that single query for second phase would be for all articleId1,...,articleIdN something like: q=((apples OR orange) AND articleId:articleId1) OR ((apples OR orange) AND articleId:articleId2) OR ... OR apples OR orange) AND articleId:articleIdN) But, here in second case results are ordered by sentence scoring instead of article and reslts should be re-ordered. Is this what is unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results refering to? Yes. My actual question after this really long intro is: couldn't this be done with single second level query approach, but on each topN start/row chunk as user iterates through first level results? For example, user executes query q=apples and oranges and this results in 1000 results, but first page display only for example 20 results which means proposed solution would: 1. phase: execute execute q=apples and orange with fl=articleId on article core, but with start=0rows=20 2. phase: q=((apples OR orange) AND articleId:articleId1) OR ((apples OR orange) AND articleId:articleId2) OR ... OR apples OR orange) AND articleId:articleId20) 3. Reorder sentence results to match order defined by article matching scores and return to user Only, the results here would need to be collapsed on unique articleID, so only 20 results are provided in result set (because multiple sentence based doc can be returned for a single unique articleID) Would this work? I think so, but I don't have any experience using collapsing, so I can't say for sure. BTW, Otis' rearrangement of your phase #2 would also work, and would be theoretically faster to evaluate: q=+(apples orange) +articleId:(articleId1 ... articleId20) Steve
Re: Where does admin UI visually distinguish between master and slave?
Hi all, Thanks for the feedback. I've checked the code with a few different inputs and believe I have found a bug. Could someone comment as to whether I'm missing something? I will file go ahead and file it if someone can attest looks like a bug. Bug Summary: == - Admin UI replication/index.jsp checks for master or slave with the following code: if (true.equals(detailsMap.get(isSlave))) - if slave, replication/index.jsp displays the Master and Poll Intervals, etc. sections (everything up to Cores) - if false, replication/index.jsp does not display the Master, Poll Intervals section -This slave check/UI difference works correctly if the solrconfig.xml has a slave but not master section or vice versa Expected results: == Same UI difference would occur in the following scenario: a) solrconfig.xml has both master and slave entries b) use java.properties (-Dsolr.enable.master -Dsolr.enable.slave) to set master or slave at runtime *OR* c) use solrcore.properties to set master and slave at runtime Actual results: == If solrconfig.xml has both master and slave entries, replication/index.jsp shows both master and slave section regardless of system.properties On Wed, Jan 12, 2011 at 10:35 AM, Markus Jelsma markus.jel...@openindex.iowrote: Well, slaves to show different things in the replication.jsp page. Master http://10cc:8080/solr/replication Poll Interval 00:00:10 Local Index Index Version: 1294666552434, Generation: 2515 Location: /var/lib/solr/data/index Size: 4.65 GB Times Replicated Since Startup: 934 Where master nodes (or slaves where enabled=false) show: Local Index Index Version: 1294666552449, Generation: 2530 Location: /var/lib/solr/data/index Size: 4.65 GB On Wednesday 12 January 2011 17:24:57 Otis Gospodnetic wrote: Hi Will, I don't think we have a clean master or slave label anywhere in the Admin UI. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Will Milspec will.mils...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 11:18:17 AM Subject: Where does admin UI visually distinguish between master and slave? Hi all, I'm getting started with a master/slave configuration for two solr instances. Two distinguish between 'master' and 'slave', I've set he system properties (e.g. -Dmaster.enabled) and using the same 'solrconfig.xml'. I can see via the system properties admin UI that the jvm (and thus solr) sees correct values, i.e.: enable.master = false enable.slave = true However, the replication admin UI is identical for both 'master' and 'slave'. (i.e. http://localhost:8983/solr/production/admin/replication/index.jsp) I'd like a clearer visual confirmation that the master node is indeed a master and the slave is a slave. Summary question: Does the admin UI distinguish betwen master and slave? thanks will -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Resolve a DataImportHandler datasource based on previous entity
On Wed, Jan 12, 2011 at 8:49 PM, alexei achugu...@gmail.com wrote: [...] Unfortunately reorganizing the data is not an option for me. Multiple databases exist and a third party is taking care of populating them. Once a database reaches a certain size, a switch occurs and a new database is created with the same table structure. OK, I understand. Gora Mohanty-3 wrote: I meant a script that runs the query that defines the datasources for all fields, writes a Solr DIH configuration file, and then initiates a dataimport. Ok, so the query would select only the articles for which the data is sitting in a specific datasource. Then, only that one datasource would be indexed. For each additional datasource would the script initiate another full-import with the clean attribute set to false? I do not think that I am completely understanding your use case. Would it be possible for you to describe it in detail? Here is my current view of it: * From some SELECT statement, it is possible for you to tell which datasource what field should come from in the next import. * If so, before the start of a data import, a script can run that same SELECT statement, and figure out what belongs where. * In that case, the script can do the following: - Write a DIH configuration file from its knowledge of where the fields in the next import are coming from. - Do a reload-config to get the new DIH configuration. - Initiate a data import * It is not clear to me how a delta import, and similar things fit into this scenario. I.e., are you also going to be dealing with updates of documents that already exist in the Solr index? However, we can cross that bridge when we come to it. I tried to make some changes to DIH that comes with Solr 1.4.1 The getResolvedEntityAttribute(dataSource); method seems to so the trick. Here is the modified code. It feels awkward but it seems to work. [...] I hope I am not breaking any other functionality... Would it be possible to add something like this to a future release? I am sorry. As things stand, while I do want to be able to get the time to become a contributor to Solr code, it is beyond my current understanding of it to be able to comment on the above. I think that you have the right idea, but am unable to say for sure. Maybe someone more well-versed in Solr can chip in. I would definitely recommend that you open a JIRA ticket, and attach this patch. That way, at least it remains on record. Please include a description of your use case in the ticket. Regards, Gpra
Specifying returned fields
Hello, I know you can explicitly specify list of fields returned via fl=field1,field2,field3 Is there a way to specify return all fields but field1 and field2? Thanks, Dmitriy
Re: Specifying returned fields
On Thu, Jan 13, 2011 at 1:11 AM, Dmitriy Shvadskiy dshvads...@gmail.com wrote: Hello, I know you can explicitly specify list of fields returned via fl=field1,field2,field3 Is there a way to specify return all fields but field1 and field2? Not that I know of, but below is an earlier discussion thread on this subject. Please take a look at the links referenced there. IMHO, this would be a desirable feature. http://osdir.com/ml/solr-user.lucene.apache.org/2010-12/msg00171.html Regards, Gora
Re: Specifying returned fields
Thanks Gora The workaround of loading fields via LukeRequestHandler and building fl from it will work for what we need. However it takes 15 seconds per core and we have 15 cores. The query I'm running is /admin/luke?show=schema Is there a way to limit query to return just fields? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Specifying-returned-fields-tp2243423p2243923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Specifying returned fields
On Jan 12, 2011, at 12:53 , Dmitriy Shvadskiy wrote: Thanks Gora The workaround of loading fields via LukeRequestHandler and building fl from it will work for what we need. However it takes 15 seconds per core and we have 15 cores. The query I'm running is /admin/luke?show=schema Is there a way to limit query to return just fields? Yes, add numTerms=0 and it'll speed up the luke request handler dramatically. Erik
verifying that an index contains ONLY utf-8
We've created an index from a number of different documents that are supplied by third parties. We want the index to only contain UTF-8 encoded characters. I have a couple questions about this: 1) Is there any way to be sure during indexing (by setting something in the solr configuration?) that the documents that we index will always be stored in utf-8? Can solr convert documents that need converting on the fly, or can solr reject documents containing illegal characters? 2) Is there a way to scan the existing index to find any string containing non-utf8 characters? Or is there another way that I can discover if any crept into my index?
StopFilterFactory and qf containing some fields that use it and some that do not
I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory. It seems that if even 1 field on the qf parameter does not use StopFilterFactory, then stop words are not removed when searching any fields. Here's an example of what I mean: - I have 2 fields indexed: Title is textStemmed, which includes StopFilterFactory (see below). Contributor is textSimple, which does not include StopFilterFactory (see below). - The is a stop word in stopwords.txt - q=lifedefType=edismaxqf=Title ... returns 277,635 results - q=the lifedefType=edismaxqf=Title ... returns 277,635 results - q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 results - q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results It seems as if the stop words are not being stripped from the query because qf contains a field that doesn't use StopFilterFactory. I did testing with combining Stemmed fields with not Stemmed fields in qf and it seems as if stemming gets applied regardless. But stop words do not. Does anyone have ideas on what is going on? Is this a feature or possibly a bug? Any known workarounds? Any advice is appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 fieldType name=textSimple class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType
Re: StopFilterFactory and qf containing some fields that use it and some that do not
I haven't used edismax but i can imagine its a feature. Ths is because inconstent use of stopwords in the analyzers of the fields specified in qf can yield really unexpected results because of the mm parameter. In dismax, if one analyzer removed stopwords and the other doesn't the mm parameter goes crazy. I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory. It seems that if even 1 field on the qf parameter does not use StopFilterFactory, then stop words are not removed when searching any fields. Here's an example of what I mean: - I have 2 fields indexed: Title is textStemmed, which includes StopFilterFactory (see below). Contributor is textSimple, which does not include StopFilterFactory (see below). - The is a stop word in stopwords.txt - q=lifedefType=edismaxqf=Title ... returns 277,635 results - q=the lifedefType=edismaxqf=Title ... returns 277,635 results - q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 results - q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results It seems as if the stop words are not being stripped from the query because qf contains a field that doesn't use StopFilterFactory. I did testing with combining Stemmed fields with not Stemmed fields in qf and it seems as if stemming gets applied regardless. But stop words do not. Does anyone have ideas on what is going on? Is this a feature or possibly a bug? Any known workarounds? Any advice is appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 fieldType name=textSimple class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType
Re: verifying that an index contains ONLY utf-8
This is supposed to be dealt with outside the index. All input must be UTF-8 encoded. Failing to do so will give unexpected results. We've created an index from a number of different documents that are supplied by third parties. We want the index to only contain UTF-8 encoded characters. I have a couple questions about this: 1) Is there any way to be sure during indexing (by setting something in the solr configuration?) that the documents that we index will always be stored in utf-8? Can solr convert documents that need converting on the fly, or can solr reject documents containing illegal characters? 2) Is there a way to scan the existing index to find any string containing non-utf8 characters? Or is there another way that I can discover if any crept into my index?
Re: StopFilterFactory and qf containing some fields that use it and some that do not
Have used edismax and Stopword filters as well. But usually use the fq parameter e.g. fq=title:the life and never had any issues. Can you turn on the debugQuery and check whats the Query formed for all the combinations you mentioned. Regards, Jayendra On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James james.d...@ingrambook.comwrote: I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory. It seems that if even 1 field on the qf parameter does not use StopFilterFactory, then stop words are not removed when searching any fields. Here's an example of what I mean: - I have 2 fields indexed: Title is textStemmed, which includes StopFilterFactory (see below). Contributor is textSimple, which does not include StopFilterFactory (see below). - The is a stop word in stopwords.txt - q=lifedefType=edismaxqf=Title ... returns 277,635 results - q=the lifedefType=edismaxqf=Title ... returns 277,635 results - q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 results - q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results It seems as if the stop words are not being stripped from the query because qf contains a field that doesn't use StopFilterFactory. I did testing with combining Stemmed fields with not Stemmed fields in qf and it seems as if stemming gets applied regardless. But stop words do not. Does anyone have ideas on what is going on? Is this a feature or possibly a bug? Any known workarounds? Any advice is appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 fieldType name=textSimple class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType
Re: StopFilterFactory and qf containing some fields that use it and some that do not
Have used edismax and Stopword filters as well. But usually use the fq parameter e.g. fq=title:the life and never had any issues. That is because filter queries are not relevant for the mm parameter which is being used for the main query. Can you turn on the debugQuery and check whats the Query formed for all the combinations you mentioned. Regards, Jayendra On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James james.d...@ingrambook.comwrote: I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory. It seems that if even 1 field on the qf parameter does not use StopFilterFactory, then stop words are not removed when searching any fields. Here's an example of what I mean: - I have 2 fields indexed: Title is textStemmed, which includes StopFilterFactory (see below). Contributor is textSimple, which does not include StopFilterFactory (see below). - The is a stop word in stopwords.txt - q=lifedefType=edismaxqf=Title ... returns 277,635 results - q=the lifedefType=edismaxqf=Title ... returns 277,635 results - q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 results - q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results It seems as if the stop words are not being stripped from the query because qf contains a field that doesn't use StopFilterFactory. I did testing with combining Stemmed fields with not Stemmed fields in qf and it seems as if stemming gets applied regardless. But stop words do not. Does anyone have ideas on what is going on? Is this a feature or possibly a bug? Any known workarounds? Any advice is appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 fieldType name=textSimple class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType
PHP app not communicating with Solr
Web page returns the following message: Fatal error: Uncaught exception 'Exception' with message '0 Status: Communication Error' This happens in a dev environment, everything on one machine: Windows 7, WAMP, CakePHP, Tomcat, Solr, and SolrPHPClient. Error message also references line 334 of the Service.php file, which is part of the SolrPHPClient. Everything works perfectly on a different machine so this problem is probably related to configuration. On the problem machine, I can reach solr at http://localhost:8080/solr/admin and it looks correct (AFAIK). I am documenting the setup procedures this time around but don't know what's different between the two machines. Google search on the error message shows the message is not uncommon so the answer might be helpful to others as well. Thanks, Eric
Re: PHP app not communicating with Solr
On 12.01.2011, at 23:50, Eric wrote: Web page returns the following message: Fatal error: Uncaught exception 'Exception' with message '0 Status: Communication Error' This happens in a dev environment, everything on one machine: Windows 7, WAMP, CakePHP, Tomcat, Solr, and SolrPHPClient. Error message also references line 334 of the Service.php file, which is part of the SolrPHPClient. Everything works perfectly on a different machine so this problem is probably related to configuration. On the problem machine, I can reach solr at http://localhost:8080/solr/admin and it looks correct (AFAIK). I am documenting the setup procedures this time around but don't know what's different between the two machines. Google search on the error message shows the message is not uncommon so the answer might be helpful to others as well. I ran into this issue compiling PHP with--curl-wrappers. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: solr wildcard queries and analyzers
Had the same issues with international characters and wildcard searches. One workaround we implemented, was to index the field with and without the ASCIIFoldingFilterFactory. You would have an original field and one with english equivalent to be used during searching. Wildcard searches with english equivalent or international terms would match either of those. Also, lowere case the search terms if you are using lowercasefilter during indexing. Reagrds, Jayendra On Wed, Jan 12, 2011 at 7:46 AM, Kári Hreinsson k...@gagnavarslan.iswrote: Have you made any progress? Since the AnalyzingQueryParser doesn't inherit from QParserPlugin solr doesn't want to use it but I guess we could implement a similar parser that does inherit from QParserPlugin? Switching parser seems to be what is needed? Has really no one solved this before? - Kári - Original Message - From: Matti Oinas matti.oi...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 11 January, 2011 12:47:52 PM Subject: Re: solr wildcard queries and analyzers This might be the solution. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html 2011/1/11 Matti Oinas matti.oi...@gmail.com: Sorry, the message was not meant to be sent here. We are struggling with the same problem here. 2011/1/11 Matti Oinas matti.oi...@gmail.com: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: Can't find source or jar for Solr class JaspellTernarySearchTrie
Checkout and build the code from - https://svn.apache.org/repos/asf/lucene/dev/trunk/ Class - https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java Regards, Jayendra On Wed, Jan 12, 2011 at 8:46 AM, Larry White ljw1...@gmail.com wrote: Hi, I'm trying to find the source code for class: JaspellTernarySearchTrie. It's supposed to be used for spelling suggestions. It's referenced in the javadoc: http://lucene.apache.org/solr/api/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.html I realize this is a dumb question, but i've been looking through the downloads for several hours. I can't actually find the package org/apache/solr/spelling/suggest/ that it's supposed to be under. So if you would be so kind... What jar is it compiled into? Where is the source in the downloaded source tree? thanks.
RE: StopFilterFactory and qf containing some fields that use it and some that do not
Here is what debug says each of these queries parse to: 1. q=lifedefType=edismaxqf=Title ... returns 277,635 results 2. q=the lifedefType=edismaxqf=Title ... returns 277,635 results 3. q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 4. q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results 1. +DisjunctionMaxQuery((Title:life)) 2. +((DisjunctionMaxQuery((Title:life)))~1) 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life)) 4. +((DisjunctionMaxQuery((Contributor:the)) DisjunctionMaxQuery((Contributor:life | Title:life)))~2) I see what's going on here. Because the is a stop word for Title, it gets removed from first part of the expression. This means that Contributor is required to contain the. dismax does the same thing too. I guess I should have run debug before asking the mail list! It looks like the only workarounds I have is to either filter out the stopwords in the client when this happens, or enable stop words for all the fields that are used in qf with stopword-enabled fields. Unless...someone has a better idea?? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, January 12, 2011 4:44 PM To: solr-user@lucene.apache.org Cc: Jayendra Patil Subject: Re: StopFilterFactory and qf containing some fields that use it and some that do not Have used edismax and Stopword filters as well. But usually use the fq parameter e.g. fq=title:the life and never had any issues. That is because filter queries are not relevant for the mm parameter which is being used for the main query. Can you turn on the debugQuery and check whats the Query formed for all the combinations you mentioned. Regards, Jayendra On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James james.d...@ingrambook.comwrote: I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory. It seems that if even 1 field on the qf parameter does not use StopFilterFactory, then stop words are not removed when searching any fields. Here's an example of what I mean: - I have 2 fields indexed: Title is textStemmed, which includes StopFilterFactory (see below). Contributor is textSimple, which does not include StopFilterFactory (see below). - The is a stop word in stopwords.txt - q=lifedefType=edismaxqf=Title ... returns 277,635 results - q=the lifedefType=edismaxqf=Title ... returns 277,635 results - q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 results - q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results It seems as if the stop words are not being stripped from the query because qf contains a field that doesn't use StopFilterFactory. I did testing with combining Stemmed fields with not Stemmed fields in qf and it seems as if stemming gets applied regardless. But stop words do not. Does anyone have ideas on what is going on? Is this a feature or possibly a bug? Any known workarounds? Any advice is appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 fieldType name=textSimple class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType
Re: verifying that an index contains ONLY utf-8
converting on the fly is not supported by Solr but should be relative easy in Java. Also scanning is relative simple (accept only a range). Detection too: http://www.mozilla.org/projects/intl/chardet.html We've created an index from a number of different documents that are supplied by third parties. We want the index to only contain UTF-8 encoded characters. I have a couple questions about this: 1) Is there any way to be sure during indexing (by setting something in the solr configuration?) that the documents that we index will always be stored in utf-8? Can solr convert documents that need converting on the fly, or can solr reject documents containing illegal characters? 2) Is there a way to scan the existing index to find any string containing non-utf8 characters? Or is there another way that I can discover if any crept into my index? -- http://jetwick.com open twitter search
Exciting Solr Use Cases
Hi all! Would you mind to write about your Solr project if it has an uncommon approach or if it is somehow exciting? I would like to extend my list for a new blog post. Examples I have in mind at the moment are: loggly (real time + big index), solandra (nice solr + cassandra combination), haiti trust (extrem index size), ... Kind Regards, Peter.
Re: PHP app not communicating with Solr
I was unable to get it to compile. From the author, got one reply about the benefits of the compiled version. After submitting my errors to him, have not yet received a reply. ##Weird thing 'on the way to the forum' today.## I remember reading an article a couple of days ago which said the compiled version is 10-15% faster than the 'pure PHP' Solr library out there, (and it has a lot more capability,that's for sure!) Turns out, this slower pure PHP version uses 'file_get_contents()'(FCG) to do the actual query of the Solr Instance. http://stackoverflow.com/questions/23/file-get-contents-vs-curl-what-has-better-performance The article above shows that FCG is on average 22% slower than using cURL in basic usage. so modifying the 'pure PHP' library with cURL would make up for all of the speed that the compiled SolrPHP has. Dennis Gearon - Original Message From: Lukas Kahwe Smith m...@pooteeweet.org To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 2:52:46 PM Subject: Re: PHP app not communicating with Solr On 12.01.2011, at 23:50, Eric wrote: Web page returns the following message: Fatal error: Uncaught exception 'Exception' with message '0 Status: Communication Error' This happens in a dev environment, everything on one machine: Windows 7, WAMP, CakePHP, Tomcat, Solr, and SolrPHPClient. Error message also references line 334 of the Service.php file, which is part of the SolrPHPClient. Everything works perfectly on a different machine so this problem is probably related to configuration. On the problem machine, I can reach solr at http://localhost:8080/solr/admin and it looks correct (AFAIK). I am documenting the setup procedures this time around but don't know what's different between the two machines. Google search on the error message shows the message is not uncommon so the answer might be helpful to others as well. I ran into this issue compiling PHP with--curl-wrappers. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: StopFilterFactory and qf containing some fields that use it and some that do not
Here's another thread on the subject: http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug- td493483.html And slightly off topic: you'd also might want to look at using common grams, they are really useful for phrase queries that contain stopwords. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory Here is what debug says each of these queries parse to: 1. q=lifedefType=edismaxqf=Title ... returns 277,635 results 2. q=the lifedefType=edismaxqf=Title ... returns 277,635 results 3. q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 4. q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results 1. +DisjunctionMaxQuery((Title:life)) 2. +((DisjunctionMaxQuery((Title:life)))~1) 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life)) 4. +((DisjunctionMaxQuery((Contributor:the)) DisjunctionMaxQuery((Contributor:life | Title:life)))~2) I see what's going on here. Because the is a stop word for Title, it gets removed from first part of the expression. This means that Contributor is required to contain the. dismax does the same thing too. I guess I should have run debug before asking the mail list! It looks like the only workarounds I have is to either filter out the stopwords in the client when this happens, or enable stop words for all the fields that are used in qf with stopword-enabled fields. Unless...someone has a better idea?? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, January 12, 2011 4:44 PM To: solr-user@lucene.apache.org Cc: Jayendra Patil Subject: Re: StopFilterFactory and qf containing some fields that use it and some that do not Have used edismax and Stopword filters as well. But usually use the fq parameter e.g. fq=title:the life and never had any issues. That is because filter queries are not relevant for the mm parameter which is being used for the main query. Can you turn on the debugQuery and check whats the Query formed for all the combinations you mentioned. Regards, Jayendra On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James james.d...@ingrambook.comwrote: I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory. It seems that if even 1 field on the qf parameter does not use StopFilterFactory, then stop words are not removed when searching any fields. Here's an example of what I mean: - I have 2 fields indexed: Title is textStemmed, which includes StopFilterFactory (see below). Contributor is textSimple, which does not include StopFilterFactory (see below). - The is a stop word in stopwords.txt - q=lifedefType=edismaxqf=Title ... returns 277,635 results - q=the lifedefType=edismaxqf=Title ... returns 277,635 results - q=lifedefType=edismaxqf=Title Contributor ... returns 277,635 results - q=the lifedefType=edismaxqf=Title Contributor ... returns 0 results It seems as if the stop words are not being stripped from the query because qf contains a field that doesn't use StopFilterFactory. I did testing with combining Stemmed fields with not Stemmed fields in qf and it seems as if stemming gets applied regardless. But stop words do not. Does anyone have ideas on what is going on? Is this a feature or possibly a bug? Any known workarounds? Any advice is appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 fieldType name=textSimple class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory
Re: PHP app not communicating with Solr
Resolved! In a rare flash of clarity, I removed the @ preceeding the file_get_contents call. Doing so made it apparent that my app was passing an incorrect Solr service port number to the SolrPHPClient code. Correcting the port number fixed the issue. The lesson is... suppressed errors are hard to find. --- On Wed, 1/12/11, Dennis Gearon gear...@sbcglobal.net wrote: From: Dennis Gearon gear...@sbcglobal.net Subject: Re: PHP app not communicating with Solr To: solr-user@lucene.apache.org Date: Wednesday, January 12, 2011, 3:37 PM I was unable to get it to compile. From the author, got one reply about the benefits of the compiled version. After submitting my errors to him, have not yet received a reply. ##Weird thing 'on the way to the forum' today.## I remember reading an article a couple of days ago which said the compiled version is 10-15% faster than the 'pure PHP' Solr library out there, (and it has a lot more capability,that's for sure!) Turns out, this slower pure PHP version uses 'file_get_contents()'(FCG) to do the actual query of the Solr Instance. http://stackoverflow.com/questions/23/file-get-contents-vs-curl-what-has-better-performance The article above shows that FCG is on average 22% slower than using cURL in basic usage. so modifying the 'pure PHP' library with cURL would make up for all of the speed that the compiled SolrPHP has. Dennis Gearon - Original Message From: Lukas Kahwe Smith m...@pooteeweet.org To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 2:52:46 PM Subject: Re: PHP app not communicating with Solr On 12.01.2011, at 23:50, Eric wrote: Web page returns the following message: Fatal error: Uncaught exception 'Exception' with message '0 Status: Communication Error' This happens in a dev environment, everything on one machine: Windows 7, WAMP, CakePHP, Tomcat, Solr, and SolrPHPClient. Error message also references line 334 of the Service.php file, which is part of the SolrPHPClient. Everything works perfectly on a different machine so this problem is probably related to configuration. On the problem machine, I can reach solr at http://localhost:8080/solr/admin and it looks correct (AFAIK). I am documenting the setup procedures this time around but don't know what's different between the two machines. Google search on the error message shows the message is not uncommon so the answer might be helpful to others as well. I ran into this issue compiling PHP with--curl-wrappers. regards, Lukas Kahwe Smith m...@pooteeweet.org
Solr 4.0 = Spatial Search - How to
Ok, this could be very easy to do but was not able to do this. Need to enable location search i.e. if someone searches for location 'New York' = show results for New York and results within 50 miles of New York. We do have latitude/longitude stored in database for each record but not sure how to index these values to enable spatial search. Any help would be much appreciated. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 = Spatial Search - How to
I believe this is what you are looking for. I renamed the field called store to coords in the schema.xml file. The tricky part is building out the query. I am using SolrNet to do this though and have not yet cracked the problem. http://localhost:8983/solr/select?q=*:*+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]fq={!bbox}sfield=coordspt=32.15,-93.85d=500 Adam On Wed, Jan 12, 2011 at 8:01 PM, caman aboxfortheotherst...@gmail.comwrote: Ok, this could be very easy to do but was not able to do this. Need to enable location search i.e. if someone searches for location 'New York' = show results for New York and results within 50 miles of New York. We do have latitude/longitude stored in database for each record but not sure how to index these values to enable spatial search. Any help would be much appreciated. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 = Spatial Search - How to
Adam, thanks. Yes that helps but how does coords fields get populated? All I have is field name=lat type=tdouble indexed=true stored=true / field name=lng type=tdouble indexed=true stored=true / field name=coord type=location indexed=true stored=true / fields 'lat' and 'lng' get populated by dataimporthandler but coord, am not sure? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245709.html Sent from the Solr - User mailing list archive at Nabble.com.
Anyone seen measurable performance improvement using Apache Portable Runtime (APR) with Solr and Tomcat
Hi all, Has anyone seen used Apache Portable Runtime (APR) in conjunction with Solr and Tomcat? Has anyone seen (or better, measured) performance improvements when using APR? APR is a library that implements some functionality using Native C (see http://apr.apache.org/ and http://en.wikipedia.org/wiki/Apache_Portable_Runtime) From wikipedia entry: quote The range of platform-independent functionality provided by APR includes: * Memory allocation and memory pool functionality * Atomic operations * Dynamic library handling * File I/O * Command argument parsing * Locking * Hash tables and arrays * Mmap functionality * Network sockets and protocols * Thread, process and mutex functionality * Shared memory functionality * Time routines * User and group ID services /endquote I could imagine benefits in file IO as network IO. But that's pure conjecture. Comments? thanks in advance
Re: Solr 4.0 = Spatial Search - How to
Actually, I by looking at the results from the geofilt filter it would appear that it's not giving me the results I'm looking for. Or maybe it is...I need to convert my results to KML to see if it is actually performing a proper radius query. http://localhost:8983/solr/select?q=*:*fq={!geofilt%20pt=39.0914154052734,-84.517822265625%20sfield=coords%20d=5000}http://localhost:8983/solr/select?q=*:*+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]fq={!geofilt%20pt=32.15,-93.85%20sfield=coords%20d=5000} http://localhost:8983/solr/select?q=*:*+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]fq={!geofilt%20pt=32.15,-93.85%20sfield=coords%20d=5000}Please let me know what you find. Adam On Wed, Jan 12, 2011 at 8:24 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I believe this is what you are looking for. I renamed the field called store to coords in the schema.xml file. The tricky part is building out the query. I am using SolrNet to do this though and have not yet cracked the problem. http://localhost:8983/solr/select?q=*:*+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]fq={!bbox}sfield=coordspt=32.15,-93.85d=500http://localhost:8983/solr/select?q=*:*+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]fq=%7B!bbox%7Dsfield=coordspt=32.15,-93.85d=500 Adam On Wed, Jan 12, 2011 at 8:01 PM, caman aboxfortheotherst...@gmail.comwrote: Ok, this could be very easy to do but was not able to do this. Need to enable location search i.e. if someone searches for location 'New York' = show results for New York and results within 50 miles of New York. We do have latitude/longitude stored in database for each record but not sure how to index these values to enable spatial search. Any help would be much appreciated. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 = Spatial Search - How to
In my case, I am getting data from a database and am able to concatenate the lat/long as a coordinate pair to store in my coords field. To test this, I randomized the lat/long values and generated about 6000 documents. Adam On Wed, Jan 12, 2011 at 8:29 PM, caman aboxfortheotherst...@gmail.comwrote: Adam, thanks. Yes that helps but how does coords fields get populated? All I have is field name=lat type=tdouble indexed=true stored=true / field name=lng type=tdouble indexed=true stored=true / field name=coord type=location indexed=true stored=true / fields 'lat' and 'lng' get populated by dataimporthandler but coord, am not sure? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245709.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-word exact keyword case-insensitive search suggestions
Hi all, I'm just stuck with exact keyword for several days. Hope you guys could help me. Here is the scenario: 1. It need to be matched with multi-word keyword and case insensitive 2. Partial word or single word matching with this field is not allowed I want to know the field type definition for this field and sample solr query. I need to combine this search with my full text search which uses dismax query. Thanks -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Exciting Solr Use Cases
When I have it running with a permission system (through both API and front end), I will share i with everyone. It's beginning tohappen. The search if fairly primative for now. But we hope to learn or hire skills ot better match it to the business model as we grow/get funding. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Peter Karich peat...@yahoo.de To: solr-user@lucene.apache.org Sent: Wed, January 12, 2011 3:37:12 PM Subject: Exciting Solr Use Cases Hi all! Would you mind to write about your Solr project if it has an uncommon approach or if it is somehow exciting? I would like to extend my list for a new blog post. Examples I have in mind at the moment are: loggly (real time + big index), solandra (nice solr + cassandra combination), haiti trust (extrem index size), ... Kind Regards, Peter.
Re: spell suggest response
Hi Juan, yeah.. i tried of onlyMorePopular and got some results but are not similar words or near words to the word i have given in the query.. Here i state you the output.. http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.collate=truespellcheck.onlyMorePopular=truespellcheck.count=20 the o/p i get is -arr name=suggestion strdata/str strhave/str strcan/str strany/str strall/str strhas/str streach/str strpart/str strmake/str strthan/str stralso/str /arr but this words are not similar to the given word 'java' the near words would be javac,javax,data,java.io... etc.., the stated words are present in the index.. Regards, satya
Question on deleting all rows for an index
We are just staring with Solr and have a multi core implementation and need to delete all the rows in the index to clean things up. When running an update via a url we are using something like the following which works fine: http://localhost:8983/solr/template/update/csv?commit=trueescape=\stream.file=/opt/TEMPLATE_DATA.csv Not clear on how to delete all the rows in this index. The documentation gives this example: deletequerytimestamp:[* TO NOW-12HOUR]/query/delete I'm not clear on the context of this command - is this through the Solr admin or can you run this via the restful call? Trying to add this to a restful call does not work like this attempt: http://localhost:8983/solr/template/deletequerytimestamp:[* TO NOW-12HOUR]/query/delete Any thoughts appreciated. Bob
Re: Question on deleting all rows for an index
On Thu, Jan 13, 2011 at 6:08 AM, Wilson, Robert rwil...@constantcontact.com wrote: We are just staring with Solr and have a multi core implementation and need to delete all the rows in the index to clean things up. When running an update via a url we are using something like the following which works fine: http://localhost:8983/solr/template/update/csv?commit=trueescape=\stream.file=/opt/TEMPLATE_DATA.csv Not clear on how to delete all the rows in this index. The documentation gives this example: deletequerytimestamp:[* TO NOW-12HOUR]/query/delete [...] Not sure where you got that from. The proper delete query to delete *all* records would be: deletequery*:*/query/delete Please note that you have to follow the delete with a commit. You can use curl to call Solr for both the delete and commit. Please see http://wiki.apache.org/solr/UpdateXmlMessages for details. Regards, Gora
Re: Question on deleting all rows for an index
Hi Robert, You can find an example of something similar to this in the examples that are part of the solr distribution. The tutorial ( http://lucene.apache.org/solr/tutorial.html) describes how to post data to the solr server via the post.jar user:~/solr/example/exampledocs$ *java -jar post.jar solr.xml monitor.xml* If you take a look at the solr.xml file, you will see add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field /doc /add I think you can post your delete query to the server in the same way. Hope this helps. -Daniel We are just staring with Solr and have a multi core implementation and need to delete all the rows in the index to clean things up. When running an update via a url we are using something like the following which works fine: http://localhost:8983/solr/template/update/csv?commit=trueescape=\stream.file=/opt/TEMPLATE_DATA.csv Not clear on how to delete all the rows in this index. The documentation gives this example: deletequerytimestamp:[* TO NOW-12HOUR]/query/delete I'm not clear on the context of this command - is this through the Solr admin or can you run this via the restful call? Trying to add this to a restful call does not work like this attempt: http://localhost:8983/solr/template/deletequerytimestamp:[* TO NOW-12HOUR]/query/delete Any thoughts appreciated. Bob
basic document crud in an index
OK, getting ready to be more intereactive with my index, (she likes me). These are pretty much boolean answered questions to help my understanding. I think having these in the mail list records might help other too. A/ Is there a query that updates all the fields automatically on a record that has a unique id? B/ Does it leave the old document and new document in the index? C/ Will a query immedialty following see both documents? D/ Merging does not get rid of any old documents if there are any, but optimize does? E/ Is optimize invoked on the whole index, not individual segments? Thanks for a great product, ya'll. I have a 64K document index, small by many standards. But I did a search on it for a test, and started at row 16,000 of the results (broad results), and almost not noticeably slower than starting at 0. And it's on the lowest cost Amazon server that will run it. Of course, no one but me is hitting that box yet :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: solr wildcard queries and analyzers
I'm little busy right now, but I'm going to try to find suitable parser or if none is found then I think the only solution is to write a new one. 2011/1/13 Jayendra Patil jayendra.patil@gmail.com: Had the same issues with international characters and wildcard searches. One workaround we implemented, was to index the field with and without the ASCIIFoldingFilterFactory. You would have an original field and one with english equivalent to be used during searching. Wildcard searches with english equivalent or international terms would match either of those. Also, lowere case the search terms if you are using lowercasefilter during indexing. Reagrds, Jayendra On Wed, Jan 12, 2011 at 7:46 AM, Kári Hreinsson k...@gagnavarslan.iswrote: Have you made any progress? Since the AnalyzingQueryParser doesn't inherit from QParserPlugin solr doesn't want to use it but I guess we could implement a similar parser that does inherit from QParserPlugin? Switching parser seems to be what is needed? Has really no one solved this before? - Kári - Original Message - From: Matti Oinas matti.oi...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 11 January, 2011 12:47:52 PM Subject: Re: solr wildcard queries and analyzers This might be the solution. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html 2011/1/11 Matti Oinas matti.oi...@gmail.com: Sorry, the message was not meant to be sent here. We are struggling with the same problem here. 2011/1/11 Matti Oinas matti.oi...@gmail.com: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: Question on deleting all rows for an index
Use this type of url for delete all data with fallowed by commit http://localhost:8983/solr/update/?stream.body=deletequery*:*/query/deletecommit=true - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-deleting-all-rows-for-an-index-tp2246726p2246948.html Sent from the Solr - User mailing list archive at Nabble.com.