CachedSqlEntityProcessor And Delta Imports
Hi, I am on the 1.4 Nightly build from September (still need to upgrade). I am using CachedSqlEntityProcessor for my main queries but was hoping to use it for my delta-imports as well. Is this possible? I have a main entity called 'Article' and then entity name=body pk=CmsArticleBodyPageId query=SELECT Body, CmsArticleId FROM CmsArticleBodyPage processor=CachedSqlEntityProcessor cacheKey=CmsArticleId cacheLookup=article.CmsArticleId deltaQuery=SELECT Body, CmsArticleId FROM CmsArticleBodyPage WHERE where convert(nvarchar(50), UpdatedDate, 127) convert(nvarchar(50), replace('${dataimporter.last_index_time}', '\', ''), 127) parentDeltaQuery=select * from SolrSearch where convert(nvarchar(50), CmsArticleId) = convert(nvarchar(50), '${body.CmsArticleId}') field column=Body name=Body/ /entity IF you can use the CachedSqlEntityProcessor for DeltaQuery's...how would one change this? Thanks in advance Kirsty -- View this message in context: http://old.nabble.com/CachedSqlEntityProcessor-And-Delta-Imports-tp27717661p27717661.html Sent from the Solr - User mailing list archive at Nabble.com.
Spellcheck in mulitlanguage
Hi I need Spell check suggestions for user queries (In Italian).how can i get this...in my solr -- View this message in context: http://old.nabble.com/Spellcheck-in-mulitlanguage-tp27717787p27717787.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Cell RTF Woes
Are you running on a Linux/Unix box that has no X ... Did you try with headless options ? http://java.sun.com/developer/technicalArticles/J2SE/Desktop/headless/ Tika's RTF is using Swing and AWT to analyze the rtf, these in turn will attempt to use Graphics libraries, unless you use headless. -Original Message- From: Bill Engle [mailto:billengle...@gmail.com] Sent: 25 February 2010 19:09 To: solr-user@lucene.apache.org Subject: Solr Cell RTF Woes Any RTF file I tried to index in Solr 1.4 throws these errors out. I have no issues with doc, pdf. Any thoughts? Thanks. htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white; } B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-s ize:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - Could not initialize class java.awt.EventQueue java.lang.NoClassDefFoundError: Could not initialize class java.awt.EventQueue at javax.swing.SwingUtilities.isEventDispatchThread(SwingUtilities.java:133 3) at javax.swing.text.StyleContext.reclaim(StyleContext.java:437) at javax.swing.text.StyleContext.addAttribute(StyleContext.java:294) at javax.swing.text.StyleContext$NamedStyle.addAttribute(StyleContext.java: 1488) at javax.swing.text.StyleContext$NamedStyle.setName(StyleContext.java:1298) at javax.swing.text.StyleContext$NamedStyle.lt;initgt;(StyleContext.java: 1245) at javax.swing.text.StyleContext.addStyle(StyleContext.java:90) at javax.swing.text.StyleContext.lt;initgt;(StyleContext.java:70) at javax.swing.text.DefaultStyledDocument.lt;initgt;(DefaultStyledDocumen t.java:95) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:42) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extract ingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReq uest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uCould not initialize class java.awt.EventQueue java.lang.NoClassDefFoundError: Could not initialize class java.awt.EventQueue at javax.swing.SwingUtilities.isEventDispatchThread(SwingUtilities.java:133 3) at javax.swing.text.StyleContext.reclaim(StyleContext.java:437) at javax.swing.text.StyleContext.addAttribute(StyleContext.java:294) at javax.swing.text.StyleContext$NamedStyle.addAttribute(StyleContext.java: 1488) at javax.swing.text.StyleContext$NamedStyle.setName(StyleContext.java:1298) at javax.swing.text.StyleContext$NamedStyle.lt;initgt;(StyleContext.java: 1245) at javax.swing.text.StyleContext.addStyle(StyleContext.java:90) at javax.swing.text.StyleContext.lt;initgt;(StyleContext.java:70) at javax.swing.text.DefaultStyledDocument.lt;initgt;(DefaultStyledDocumen t.java:95) at
Content Extraction
Hey All Hope someone can advise. I followed the example in the wiki on how to extract a html page i.e curl 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true' -F myfi...@tutorial.html And it displayed a html page but with a 404 and did not index the document? Any suggestions on how I can fix this? Thanks if you can advise. Lee
Re: Index size
Hi, All the document can be up to 10K. Most if it comes from a single field which is both indexed and stored. The data is uncompressed because it would eat up to much CPU considering the volume we have. We have around 30 fields in all. We also need to compute some facets as well as collapse the documents forming the result set and to be able to sort them on any field. Thx On 2010-02-25, at 5:50 PM, Otis Gospodnetic wrote: It depends on many factors - how big those docs are (compare a tweet to a news article to a book chapter) whether you store the data or just index it, whether you compress it, how and how much you analyze the data, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jean-Sebastien Vachon js.vac...@videotron.ca To: solr-user@lucene.apache.org Sent: Wed, February 24, 2010 8:57:21 AM Subject: Index size Hi All, I'm currently looking on integrating Solr and I'd like to have some hints on the size of the index (number of documents) I could possibly host on a server running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. Basically, I need to determine how many of these servers would be required to host about half a billion documents. Should I setup multiple Solr instances (in Virtual Machines or not) or should I run a single instance (with multicores or not) using all available memory as the cache ? I also made some tests with shardings on this same server and I could not see any improvement (at least not with 4.5 millions documents). Should all the shards be hosted on different servers? I shall try with more documents in the following days. Thx
Auto suggestion
Hi AutoSuggestion not found for newly indexed data ,how can i configure that anyone help me Thans in advance -- View this message in context: http://old.nabble.com/Auto-suggestion-tp27718858p27718858.html Sent from the Solr - User mailing list archive at Nabble.com.
Highest frequency
Hello all (sorry if my english is bad, i'm french) ! I have a Solr Index with ads which contain a title and a description ! For exemple : add 1 : title = test / description = [empty] add 2 : title = test on test / description = this is a test And now, if I execute the request test in solr/admin, the add 1 is the first result whereas the add 2 is more pertinent because the word test is more present ! So, is it possible to say to Solr, to sort the result in fact of the word frequency ? Thanks for your help ! -- View this message in context: http://old.nabble.com/Highest-frequency-tp27718930p27718930.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new/first searcher
There's no problem about having the same warming in both cases. First queries are use to warm the index once you start the solr instance. New queries warm the index once a commit in executed, for example. In first queries warming there was no previous IndexSearcher opened. In new queries there was and it's the one that serves search requests while the new one is being warmed. solrquestion6 wrote: Hi, Is it the wrong approach to have the same warmup queries in both new and first searcher? The wiki shows a sorting query for the newSearcher and the same sorting query plus facet/filter queries for the firstSearcher. -- View this message in context: http://old.nabble.com/new-first-searcher-tp27714473p27719048.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highest frequency
As far as I know it's not suported by default. I thing you should implement your custom Lucene Similarity class and plug it into Solr via solrconfig.xml pcmanprogrammeur wrote: Hello all (sorry if my english is bad, i'm french) ! I have a Solr Index with ads which contain a title and a description ! For exemple : add 1 : title = test / description = [empty] add 2 : title = test on test / description = this is a test And now, if I execute the request test in solr/admin, the add 1 is the first result whereas the add 2 is more pertinent because the word test is more present ! So, is it possible to say to Solr, to sort the result in fact of the word frequency ? Thanks for your help ! -- View this message in context: http://old.nabble.com/Highest-frequency-tp27718930p27719107.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Content Extraction
You really have to provide more details of a what you did. b what the results were. Have you looked at you r index with the admin page and/or Luke? Have you tried querying in the admin page? Have you examined the logs to see what they report? Best Erick On Fri, Feb 26, 2010 at 7:54 AM, Lee Smith l...@weblee.co.uk wrote: Hey All Hope someone can advise. I followed the example in the wiki on how to extract a html page i.e curl ' http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true' -F myfi...@tutorial.html And it displayed a html page but with a 404 and did not index the document? Any suggestions on how I can fix this? Thanks if you can advise. Lee
Re: Auto suggestion
Have you reopened the index after you added the data? Erick On Fri, Feb 26, 2010 at 9:23 AM, Suram reactive...@yahoo.com wrote: Hi AutoSuggestion not found for newly indexed data ,how can i configure that anyone help me Thans in advance -- View this message in context: http://old.nabble.com/Auto-suggestion-tp27718858p27718858.html Sent from the Solr - User mailing list archive at Nabble.com.
SEVERE: java.lang.NumberFormatException: For input string: 104708
Hello, Solr is raising the following exception when processing queries that sort on integer attribute. The same queries and sorts have been running fine in production for almost a year now. If I run the query without the sort on the integer attribute, the query runs fine. If I run a query that would return 0 results, but still has a sort parameter the exception is raised. The stack trace is the same no matter what the query. I need help troubleshooting this issue. Any clues, or suggested approaches would be helpful. Thank you in advance!. The stack trace is as follows: SEVERE: java.lang.NumberFormatException: For input string: 104708 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:456) at java.lang.Integer.parseInt(Integer.java:497) at org.apache.lucene.search.FieldCacheImpl$3.parseInt(FieldCacheImpl.java:148) at org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:262) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:245) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:239) at org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:291) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:188) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Our solr info is: Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 Lucene Specification Version: 2.4-dev Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16
replication. when the slave goes down...
Hi I have 2 solr machine. 1 master, 1 slave replicating the index from the master The machine on which the slave is running went down while the replication was running I suppose the index must be corrupted. Can I safely remove the index on the slave and restart the slave and the slave will start over the replication from scratch? Thank you
Re: Solr Cell RTF Woes
Thanks. Headless put me in the right direction. I am running on a headless Mac OSX 10.6 Server. I added the below to my {CATALINA_HOME}/bin/setenv.sh file and now I am indexing RTF. export JAVA_OPTS=-d64 -server -Xmx1024m -XX:MaxPermSize=512m -Djava.awt.headless=true -Dsun.lang.ClassLoader.allowArraySyntax=true Thanks again! -Bill On Fri, Feb 26, 2010 at 7:50 AM, david.dankwe...@ubs.com wrote: Are you running on a Linux/Unix box that has no X ... Did you try with headless options ? http://java.sun.com/developer/technicalArticles/J2SE/Desktop/headless/ Tika's RTF is using Swing and AWT to analyze the rtf, these in turn will attempt to use Graphics libraries, unless you use headless. -Original Message- From: Bill Engle [mailto:billengle...@gmail.com] Sent: 25 February 2010 19:09 To: solr-user@lucene.apache.org Subject: Solr Cell RTF Woes Any RTF file I tried to index in Solr 1.4 throws these errors out. I have no issues with doc, pdf. Any thoughts? Thanks. htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white; } B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D7 6;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-s ize:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - Could not initialize class java.awt.EventQueue java.lang.NoClassDefFoundError: Could not initialize class java.awt.EventQueue at javax.swing.SwingUtilities.isEventDispatchThread(SwingUtilities.java:133 3) at javax.swing.text.StyleContext.reclaim(StyleContext.java:437) at javax.swing.text.StyleContext.addAttribute(StyleContext.java:294) at javax.swing.text.StyleContext$NamedStyle.addAttribute(StyleContext.java: 1488) at javax.swing.text.StyleContext$NamedStyle.setName(StyleContext.java:1298) at javax.swing.text.StyleContext$NamedStyle.lt;initgt;(StyleContext.java: 1245) at javax.swing.text.StyleContext.addStyle(StyleContext.java:90) at javax.swing.text.StyleContext.lt;initgt;(StyleContext.java:70) at javax.swing.text.DefaultStyledDocument.lt;initgt;(DefaultStyledDocumen t.java:95) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:42) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extract ingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReq uest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uCould not initialize class java.awt.EventQueue java.lang.NoClassDefFoundError: Could not initialize class java.awt.EventQueue at javax.swing.SwingUtilities.isEventDispatchThread(SwingUtilities.java:133 3) at javax.swing.text.StyleContext.reclaim(StyleContext.java:437) at javax.swing.text.StyleContext.addAttribute(StyleContext.java:294) at
Re: SEVERE: java.lang.NumberFormatException: For input string: 104708
One of your field values isn't a valid integer, it's 104708 You're probably using the straight integer type in 1.3, which is meant for back compat with existing lucene indexes and currently doesn't do validation on it's input. For Solr 1.4, int is a new field type (example schema maps it to TrieIntField) that does do validation at index time, and is just as efficient for sorting. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 9:59 AM, Pablo Mercado pa...@sbnation.com wrote: Hello, Solr is raising the following exception when processing queries that sort on integer attribute. The same queries and sorts have been running fine in production for almost a year now. If I run the query without the sort on the integer attribute, the query runs fine. If I run a query that would return 0 results, but still has a sort parameter the exception is raised. The stack trace is the same no matter what the query. I need help troubleshooting this issue. Any clues, or suggested approaches would be helpful. Thank you in advance!. The stack trace is as follows: SEVERE: java.lang.NumberFormatException: For input string: 104708 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:456) at java.lang.Integer.parseInt(Integer.java:497) at org.apache.lucene.search.FieldCacheImpl$3.parseInt(FieldCacheImpl.java:148) at org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:262) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:245) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:239) at org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:291) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:188) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Our solr info is: Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 Lucene Specification Version: 2.4-dev Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16
Re: Highest frequency
The underlying Lucene automatically takes this into account.the term frequency in relation to the length of the field rather than just a term count. So in your example doc 1 has a complete field match on title, so it scores higher. Also, depending upon how you set things up you may not be searching on description. Unless you specify it searches only go against the default field (see your schema for the default field). Which brings up the question whether you really want to override this behavior. Do you really want a document with 10,000 tokens in it that mentions test five times to score higher than a document with 3 tokens that mentions test three times? This page may help you resolve this kind of question... http://lucene.apache.org/java/2_4_0/scoring.html http://lucene.apache.org/java/2_4_0/scoring.htmlHTH Erick On Fri, Feb 26, 2010 at 9:30 AM, pcmanprogrammeur pcmanprogramm...@neuf.frwrote: Hello all (sorry if my english is bad, i'm french) ! I have a Solr Index with ads which contain a title and a description ! For exemple : add 1 : title = test / description = [empty] add 2 : title = test on test / description = this is a test And now, if I execute the request test in solr/admin, the add 1 is the first result whereas the add 2 is more pertinent because the word test is more present ! So, is it possible to say to Solr, to sort the result in fact of the word frequency ? Thanks for your help ! -- View this message in context: http://old.nabble.com/Highest-frequency-tp27718930p27718930.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Content Extraction
Hi Erik I did a post with more details yesterday with no response. I have a screen shot of what it does: http://screencast.com/t/MGRiZTU5M After running it I have done a query with 0 results and have checked to see how many docs are indexed with 0 being the value. Hope you can shed some more light for me. Lee On 26 Feb 2010, at 14:57, Erick Erickson wrote: You really have to provide more details of a what you did. b what the results were. Have you looked at you r index with the admin page and/or Luke? Have you tried querying in the admin page? Have you examined the logs to see what they report? Best Erick On Fri, Feb 26, 2010 at 7:54 AM, Lee Smith l...@weblee.co.uk wrote: Hey All Hope someone can advise. I followed the example in the wiki on how to extract a html page i.e curl ' http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true' -F myfi...@tutorial.html And it displayed a html page but with a 404 and did not index the document? Any suggestions on how I can fix this? Thanks if you can advise. Lee
Re: SEVERE: java.lang.NumberFormatException: For input string: 104708
Thank you for taking the time to look at my issue and respond. Do you have any suggestions for purging the document with this field from the index? Would that even help? I do not know which document has the corrupt value, and searching for the document with something like pk_i:104708 does return a document with that value. (pk_i is the integer field that we try to sort on and that, presumably, has a non-integer value stored for some document) On Fri, Feb 26, 2010 at 10:26, Yonik Seeley yo...@lucidimagination.com wrote: One of your field values isn't a valid integer, it's 104708 You're probably using the straight integer type in 1.3, which is meant for back compat with existing lucene indexes and currently doesn't do validation on it's input. For Solr 1.4, int is a new field type (example schema maps it to TrieIntField) that does do validation at index time, and is just as efficient for sorting. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 9:59 AM, Pablo Mercado pa...@sbnation.com wrote: Hello, Solr is raising the following exception when processing queries that sort on integer attribute. The same queries and sorts have been running fine in production for almost a year now. If I run the query without the sort on the integer attribute, the query runs fine. If I run a query that would return 0 results, but still has a sort parameter the exception is raised. The stack trace is the same no matter what the query. I need help troubleshooting this issue. Any clues, or suggested approaches would be helpful. Thank you in advance!. The stack trace is as follows: SEVERE: java.lang.NumberFormatException: For input string: 104708 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:456) at java.lang.Integer.parseInt(Integer.java:497) at org.apache.lucene.search.FieldCacheImpl$3.parseInt(FieldCacheImpl.java:148) at org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:262) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:245) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:239) at org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:291) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:188) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at
Re: Solr Cell and Deduplication - Get ID of doc
Any thoughts on this? I would like to get the id back in the request after indexing. My initial thoughts were to do a search to get the docid based on the attr_stream_name after indexing but now that I reread my message I mentioned the attr_stream_name (file_name) may be different so that is unreliable. My only option is to somehow return the id in the XML response. Any guidance is greatly appreciated. -Bill On Wed, Feb 24, 2010 at 12:06 PM, Bill Engle billengle...@gmail.com wrote: Hi - New Solr user here. I am using Solr Cell to index files (PDF, doc, docx, txt, htm, etc.) and there is a good chance that a new file will have duplicate content but not necessarily the same file name. To avoid this I am using the deduplication feature of Solr. updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldid/str bool name=overwriteDupestrue/bool str name=fieldsattr_content/str str name=signatureClassorg.apache.solr.update.processor./str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain How do I get the id value post Solr processing. Is there someway to modify the curl response so that id is returned. I need this id because I would like to rename the file to the id value. I could probably do a Solr search after the fact to get the id field based on the attr_stream_name but I would like to do only one request. curl ' http://localhost:8080/solr/update/extract?uprefix=attr_fmap.content=attr_contentcommit=true' -F myfi...@myfile.pdf Thanks, Bill
Re: SEVERE: java.lang.NumberFormatException: For input string: 104708
You have to find the document with the bad value somehow. In the past I have used Luke to help with this. Then you need to delete the document. Finally, you have to get the deleted document out of the index through a merge (else the bad term will still be loaded by the FieldCache) - easiest way is to do this is an optimize. -- - Mark http://www.lucidimagination.com On 02/26/2010 10:49 AM, Pablo Mercado wrote: Thank you for taking the time to look at my issue and respond. Do you have any suggestions for purging the document with this field from the index? Would that even help? I do not know which document has the corrupt value, and searching for the document with something like pk_i:104708 does return a document with that value. (pk_i is the integer field that we try to sort on and that, presumably, has a non-integer value stored for some document) On Fri, Feb 26, 2010 at 10:26, Yonik Seeleyyo...@lucidimagination.com wrote: One of your field values isn't a valid integer, it's 104708 You're probably using the straight integer type in 1.3, which is meant for back compat with existing lucene indexes and currently doesn't do validation on it's input. For Solr 1.4, int is a new field type (example schema maps it to TrieIntField) that does do validation at index time, and is just as efficient for sorting. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 9:59 AM, Pablo Mercadopa...@sbnation.com wrote: Hello, Solr is raising the following exception when processing queries that sort on integer attribute. The same queries and sorts have been running fine in production for almost a year now. If I run the query without the sort on the integer attribute, the query runs fine. If I run a query that would return 0 results, but still has a sort parameter the exception is raised. The stack trace is the same no matter what the query. I need help troubleshooting this issue. Any clues, or suggested approaches would be helpful. Thank you in advance!. The stack trace is as follows: SEVERE: java.lang.NumberFormatException: For input string: 104708 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:456) at java.lang.Integer.parseInt(Integer.java:497) at org.apache.lucene.search.FieldCacheImpl$3.parseInt(FieldCacheImpl.java:148) at org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:262) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:245) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:239) at org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:291) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:188) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at
Re: Changing term frequency according to value of one of the fields
extend the similarity class, compile it against the jars in lib, put in a path solr can find and set your schema to use it http://wiki.apache.org/solr/SolrPlugins#Similarity On 02/25/2010 10:09 PM, Pooja Verlani wrote: Hi, I want to modify Similarity class for my app like the following- Right now tf is Math.sqrt(termFrequency) I would like to modify it to Math.sqrt(termFrequncy/solrDoc.getFieldValue(count)) where count is one of the fields in the particular solr document. Is it possible to do so? Can I import solrDocument class and take the particular solrDoc for calculating tf in the similarity class? Please suggest. regards, Pooja
Solrsharp
Hi, I don't know if this list includes this kind of help, but I'm using Solrsharp with C# to operate SOLR. Please advise if this is off-topic please. I'm having a little trouble to make a search with exclude terms using the query parameters. Does anyone uses Solrsharp around here? Do you manage to exclude terms on searches? Br Frederico
Re: SEVERE: java.lang.NumberFormatException: For input string: 104708
On Fri, Feb 26, 2010 at 10:59 AM, Mark Miller markrmil...@gmail.com wrote: You have to find the document with the bad value somehow. In the past I have used Luke to help with this. Then you need to delete the document. You can also find the document with a raw term query. q={!raw f=myfield}104708 -Yonik http://www.lucidimagination.com
Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley
Yes, it will be recorded and available to view after the presentation. -Jay On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Yonk, can you please advise whether this event will be recorded and available for later download? (It starts 5am our time ;-) ) Regards Bern -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, 25 February 2010 10:23 AM To: solr-user@lucene.apache.org Subject: Free Webinar: Mastering Solr 1.4 with Yonik Seeley I'd like to invite you to join me for an in-depth review of Solr's powerful, versatile new features and functions. The free webinar, sponsored by my company, Lucid Imagination, covers an intensive how-to for the features you need to make the most of Solr for your search application: * Faceting deep dive, from document fields to performance management * Best practices for sharding, index partitioning and scaling * How to construct efficient Range Queries and function queries * Sneak preview: Solr 1.5 roadmap Join us for a free webinar Thursday, March 4, 2010 10:00 AM PST / 1:00 PM EST / 18:00 GMT Follow this link to sign up http://www.eventsvc.com/lucidimagination/030410?trk=WR-MAR2010-AP Thanks, -Yonik http://www.lucidimagination.com
Re: Solr 1.4 distributed search configuration
Now I got it, just forgot put qt=search in query. By the way, in solr 1.3, I used shards.txt under conf directory and distributed=true in query for distributed search. In that way,in my java application, I can hard code solr query with distributed=true and control the using of distributed search by define shards.txt or not. In solr 1.4, it is more difficult to use distributed search dynamically.Is there a way I just change configuration without changing query to make DS work? Thanks, From: Mark Miller markrmil...@gmail.com To: solr-user@lucene.apache.org Date: 25/02/2010 04:13 PM Subject:Re: Solr 1.4 distributed search configuration Can you elaborate on doesn't work when you put it in the /search handler? You get an error in the logs? Nothing happens? On 02/25/2010 03:47 PM, Jeffrey Zhao wrote: Hi Mark, Thanks for your reply. I did make a new handler as following, but it does not work, anything wrong with my configuration? Thanks, requestHandler name=search class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults str name=shards202.161.196.189:8080/solr,localhost:8080/solr/str /lst arr name=components strquery/str strfacet/str strspellcheck/str strdebug/str /arr /requestHandler From: Mark Millermarkrmil...@gmail.com To: solr-user@lucene.apache.org Date: 25/02/2010 03:41 PM Subject:Re: Solr 1.4 distributed search configuration On 02/25/2010 03:32 PM, Jeffrey Zhao wrote: How do define a new search handler with a shards parameter? I defined as following way but it doesn't work. If I put the shards parameter in default handler, it seems I got an infinite loop. requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler requestHandler name=search class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults str name=shards202.161.196.189:8080/solr,localhost:8080/solr/str /lst arr name=components strquery/str strfacet/str strspellcheck/str strdebug/str /arr /requestHandler Thanks, Not seeing this on the wiki (it should be there), but you can't put the shards param on the default search handler without causing an infinite loop - you have to make a new request handler and put it on that. -- - Mark http://www.lucidimagination.com
RE: Extended stats via JMX
-Original Message- From: Matthew Runo [mailto:mr...@zappos.com] Sent: Thursday, February 25, 2010 12:18 PM To: solr-user@lucene.apache.org Subject: Re: Extended stats via JMX https://issues.apache.org/jira/browse/SOLR-1750 might help you, since I don't think that all of stats.jsp is exposed via MBeans. I could be wrong about that though.. (apologies, our solr servers are firewalled and I can't connect via JMX at the moment) Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 Hi, Matthew - Looks like Shalin confirmed that those values can in fact be found inside the RequestHandler's MBean (thanks, Shalin). Thanks for getting me going in the right direction. I appreciate it. Thanks -dant
Re: Solr 1.4 distributed search configuration
you can set a default shard parameter on the request handler doing distributed search, you can set up two different request handlers one with shards default and one without On Thu, Feb 25, 2010 at 1:35 PM, Jeffrey Zhao jeffrey.z...@metalogic-inc.com wrote: Now I got it, just forgot put qt=search in query. By the way, in solr 1.3, I used shards.txt under conf directory and distributed=true in query for distributed search. In that way,in my java application, I can hard code solr query with distributed=true and control the using of distributed search by define shards.txt or not. In solr 1.4, it is more difficult to use distributed search dynamically.Is there a way I just change configuration without changing query to make DS work? Thanks, From: Mark Miller markrmil...@gmail.com To: solr-user@lucene.apache.org Date: 25/02/2010 04:13 PM Subject: Re: Solr 1.4 distributed search configuration Can you elaborate on doesn't work when you put it in the /search handler? You get an error in the logs? Nothing happens? On 02/25/2010 03:47 PM, Jeffrey Zhao wrote: Hi Mark, Thanks for your reply. I did make a new handler as following, but it does not work, anything wrong with my configuration? Thanks, requestHandler name=search class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults str name=shards202.161.196.189:8080/solr,localhost:8080/solr/str /lst arr name=components strquery/str strfacet/str strspellcheck/str strdebug/str /arr /requestHandler From: Mark Millermarkrmil...@gmail.com To: solr-user@lucene.apache.org Date: 25/02/2010 03:41 PM Subject: Re: Solr 1.4 distributed search configuration On 02/25/2010 03:32 PM, Jeffrey Zhao wrote: How do define a new search handler with a shards parameter? I defined as following way but it doesn't work. If I put the shards parameter in default handler, it seems I got an infinite loop. requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler requestHandler name=search class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults str name=shards202.161.196.189:8080/solr,localhost:8080/solr/str /lst arr name=components strquery/str strfacet/str strspellcheck/str strdebug/str /arr /requestHandler Thanks, Not seeing this on the wiki (it should be there), but you can't put the shards param on the default search handler without causing an infinite loop - you have to make a new request handler and put it on that. -- - Mark http://www.lucidimagination.com
Question on Facets and Multiple values (confusion from the Wiki)
Certainly lots of matches on Solr and facets. Contrived example: * Solr 1.4, etc. * Yellow pages, business listings. * Business listings have a zip code that I will use in Faceted search. * Companies with multiple stores/outlets/offices still only have one record, but all applicable zip codes are listed. (yes, others ways to solve this, I know) * I want each listing to show in all of it's zip codes when zip-code facets are presented * I declare one or more fields in schema.xml per the Wiki, etc. So 3 fields, etc. The behavior I will actually get will be: (a) As described, a business in 3 zipcodes will show up under all 3 facets (b) Nope, only the first zip code will work correctly (c) Yes and No. Only the first zip code will show up in the facets. But I could certainly still search on the other codes and find that listing. If other businesses are in the other facets, a click on those zip codes will return my original business, but if it's the ONLY business in the zip code, the facet will not get displayed Normally I'd say (a). Playing with facets and reading online, this should be possible, though it may take 2 or 3 versions of the field. But why then would I even ask about (b) or (c) ? Well, there's stuff in the Wiki that makes me hesitate. First, look at this page: http://wiki.apache.org/solr/SolrFacetingOverview First section, 3rd bullet point, where it says: For faceting: Primary author only, using a solr.StringField: * Schildt, Herbert Obviously if I were sorting on this field the first author would matter a lot. And it's a bit ambiguous which copy of the field I'm using, etc. Other things that cause me to hesitate: http://wiki.apache.org/solr/SchemaXml#Common_field_options The multiValued=true|false And this: http://wiki.apache.org/solr/FieldOptionsByUseCase multiValued is left blank in many cases, and not filled in for facets. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
replication issue
Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M . r...@slr8:/raid/data# du -sh 4.7M . and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K . And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data Thank you for your help ! matt
ConcurrentModificationException
Hi guys, SOLR 1.4 (final) and 1.5 nightly work fine on a Windows box, but on our Centos 5 box, we're getting a ConcurrentModificationException when starting Tomcat 6. Any tips on how to solve this and/or troubleshoot? Made sure there are no duplicate libs in Tomcat and solr/lib, and tried to cut down contrib stuff to see if it helped, but no luck. Thanks, Dan. = = = Log Below: = = = INFO | jvm 1| 2010/02/24 21:27:04 | SEVERE: java.util.ConcurrentModificationException INFO | jvm 1| 2010/02/24 21:27:04 | at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) INFO | jvm 1| 2010/02/24 21:27:04 | at java.util.AbstractList$Itr.next(AbstractList.java:343) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:507) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.solr.core.SolrCore.init(SolrCore.java:606) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer.load(CoreContainer.java:285) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardService.start(StandardService.java:516) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.Catalina.start(Catalina.java:583) INFO | jvm 1| 2010/02/24 21:27:04 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2010/02/24 21:27:04 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) INFO | jvm 1| 2010/02/24 21:27:04 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) INFO | jvm 1| 2010/02/24 21:27:04 | at java.lang.reflect.Method.invoke(Method.java:597) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) INFO | jvm 1| 2010/02/24 21:27:04 | at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) INFO | jvm 1| 2010/02/24 21:27:04 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2010/02/24 21:27:04 | at
Re: ConcurrentModificationException
Could you open a JIRA issue for this? After a quick look, it could be firstSearcher / newSearcher events that are being executed concurrently that change the list? Could you try commenting out firstSearcher/newSearcher events in solrconfig.xml and see if that fixes it? It could be that a lazy loaded component is triggered by a firstSearcher/newSearcher event, which happen in other threads, causing another bean to be added. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 2:28 PM, Dan Hertz (Insight 49, LLC) insigh...@gmail.com wrote: Hi guys, SOLR 1.4 (final) and 1.5 nightly work fine on a Windows box, but on our Centos 5 box, we're getting a ConcurrentModificationException when starting Tomcat 6. Any tips on how to solve this and/or troubleshoot? Made sure there are no duplicate libs in Tomcat and solr/lib, and tried to cut down contrib stuff to see if it helped, but no luck. Thanks, Dan. = = = Log Below: = = = INFO | jvm 1 | 2010/02/24 21:27:04 | SEVERE: java.util.ConcurrentModificationException INFO | jvm 1 | 2010/02/24 21:27:04 | at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) INFO | jvm 1 | 2010/02/24 21:27:04 | at java.util.AbstractList$Itr.next(AbstractList.java:343) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:507) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.SolrCore.init(SolrCore.java:606) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer.load(CoreContainer.java:285) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardService.start(StandardService.java:516) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.Catalina.start(Catalina.java:583) INFO | jvm 1 | 2010/02/24 21:27:04 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1 | 2010/02/24 21:27:04 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) INFO | jvm 1 | 2010/02/24 21:27:04 | at
Re: ConcurrentModificationException
Yep, definitely a bug. It looks like resourceLoader.newInstance() is fundamentally not thread safe. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 2:48 PM, Yonik Seeley yo...@lucidimagination.com wrote: Could you open a JIRA issue for this? After a quick look, it could be firstSearcher / newSearcher events that are being executed concurrently that change the list? Could you try commenting out firstSearcher/newSearcher events in solrconfig.xml and see if that fixes it? It could be that a lazy loaded component is triggered by a firstSearcher/newSearcher event, which happen in other threads, causing another bean to be added. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 2:28 PM, Dan Hertz (Insight 49, LLC) insigh...@gmail.com wrote: Hi guys, SOLR 1.4 (final) and 1.5 nightly work fine on a Windows box, but on our Centos 5 box, we're getting a ConcurrentModificationException when starting Tomcat 6. Any tips on how to solve this and/or troubleshoot? Made sure there are no duplicate libs in Tomcat and solr/lib, and tried to cut down contrib stuff to see if it helped, but no luck. Thanks, Dan. = = = Log Below: = = = INFO | jvm 1 | 2010/02/24 21:27:04 | SEVERE: java.util.ConcurrentModificationException INFO | jvm 1 | 2010/02/24 21:27:04 | at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) INFO | jvm 1 | 2010/02/24 21:27:04 | at java.util.AbstractList$Itr.next(AbstractList.java:343) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:507) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.SolrCore.init(SolrCore.java:606) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer.load(CoreContainer.java:285) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardService.start(StandardService.java:516) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) INFO | jvm 1 | 2010/02/24 21:27:04 | at org.apache.catalina.startup.Catalina.start(Catalina.java:583) INFO | jvm 1 | 2010/02/24 21:27:04 | at
Re: replication issue
On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M. r...@slr8:/raid/data# du -sh 4.7M. and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K. And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data The ReplicationHandler moves files out of the temp directory into the index directory. Java's File#renameTo can fail if the source and target directories are on different partitions/disks. Is that the case here? I believe SOLR-1736 fixes this issue in trunk but that was implemented after the 1.4 release. -- Regards, Shalin Shekhar Mangar.
Re: HTTP ERROR: 404 missing core name in path after integrating nutch
Just wanted to give an update on my efforts. I installed the Feb. 26 update this morning. Was able to access /solr/admin. Copied over the nutch schema.xml. restarted solr and was able to access /solr/admin Edited solrconfig.xml to add the nutch requesthandler snippet from lucidimagination. Restarted solr and got the 404 missing core name in path error. What in the requesthandler snippet (see below) could be causing this error? from http://bit.ly/1mOb requestHandler name=/nutch class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf content^0.5 anchor^1.0 title^1.2 /str str name=pf content^0.5 anchor^1.5 title^1.2 site^1.5 /str str name=fl url /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int bool hl=true/ str name=q.alt*:*/str str name=hl.fltitle url content/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler Have a great weekend.
Re: replication issue
Shalin Thank you so much for your answer This is the case here How can I find out which temp directory Solr replication is using? Do you have a way to set up the source (temp directory? used by solr) and target directory via solr config file so that they live on the same partition ? Thank you matt --- On Fri, 2/26/10, Shalin Shekhar Mangar shalinman...@gmail.com wrote: From: Shalin Shekhar Mangar shalinman...@gmail.com Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Friday, February 26, 2010, 2:06 PM On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M . r...@slr8:/raid/data# du -sh 4.7M . and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K . And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data The ReplicationHandler moves files out of the temp directory into the index directory. Java's File#renameTo can fail if the source and target directories are on different partitions/disks. Is that the case here? I believe SOLR-1736 fixes this issue in trunk but that was implemented after the 1.4 release. -- Regards, Shalin Shekhar Mangar.
Re: SEVERE: java.lang.NumberFormatException: For input string: 104708
A big thanks to Yonik and Mark. Using the raw term query I was able to find the range(!) of documents that had bad integer field values. Deleting those documents, committing and optimizing cleared up the issue. Still not sure how the bad values were inserted in the first place, but that is another task. Thanks again for being so helpful. On Fri, Feb 26, 2010 at 11:29, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Feb 26, 2010 at 10:59 AM, Mark Miller markrmil...@gmail.com wrote: You have to find the document with the bad value somehow. In the past I have used Luke to help with this. Then you need to delete the document. You can also find the document with a raw term query. q={!raw f=myfield}104708 -Yonik http://www.lucidimagination.com
Re: ConcurrentModificationException
On 2010-02-26 12:55 PM, Yonik Seeley wrote: Yep, definitely a bug. It looks like resourceLoader.newInstance() is fundamentally not thread safe. -Yonik On Fri, Feb 26, 2010 at 2:48 PM, Yonik Seeley yo...@lucidimagination.com wrote: Could you open a JIRA issue for this? Yonik, Do you still need me to open a JIRA issue, or has one been opened? (I'm having trouble connecting to issues.apache.org) Thanks, Dan
Re: Question on Facets and Multiple values (confusion from the Wiki)
Hi Mark, If (a) is wanted behaviour, i.e. have a business show up in facets for all ZIPs, you should define a multi-valued ZIP field. Since a ZIP is a number, I don't see any reason for any analysis on it, a String or a lightly normalized field type would do the job both for search and facets. What I think confuses you is the author example in SolrFacetingOverview which chooses to use only the main (first) author for faceting. This is a business decision for this application and has nothing to do with faceting as such. The default would be to include all ZIPs. Probably the example in this page should clarify this behaviour. When it comes to the table in FieldOptionsByUseCase, I agree that for faceting it makes sense to recommend multiValued if you have multivalue content, but it is not required for faceting. I think this table was made to explain what params you MUST set to enable certain functionality on a field. I would set true[6] for multiValued and a footnote that it must be used for multi value faceting. -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 26. feb. 2010, at 19.12, Mark Bennett wrote: Certainly lots of matches on Solr and facets. Contrived example: * Solr 1.4, etc. * Yellow pages, business listings. * Business listings have a zip code that I will use in Faceted search. * Companies with multiple stores/outlets/offices still only have one record, but all applicable zip codes are listed. (yes, others ways to solve this, I know) * I want each listing to show in all of it's zip codes when zip-code facets are presented * I declare one or more fields in schema.xml per the Wiki, etc. So 3 fields, etc. The behavior I will actually get will be: (a) As described, a business in 3 zipcodes will show up under all 3 facets (b) Nope, only the first zip code will work correctly (c) Yes and No. Only the first zip code will show up in the facets. But I could certainly still search on the other codes and find that listing. If other businesses are in the other facets, a click on those zip codes will return my original business, but if it's the ONLY business in the zip code, the facet will not get displayed Normally I'd say (a). Playing with facets and reading online, this should be possible, though it may take 2 or 3 versions of the field. But why then would I even ask about (b) or (c) ? Well, there's stuff in the Wiki that makes me hesitate. First, look at this page: http://wiki.apache.org/solr/SolrFacetingOverview First section, 3rd bullet point, where it says: For faceting: Primary author only, using a solr.StringField: * Schildt, Herbert Obviously if I were sorting on this field the first author would matter a lot. And it's a bit ambiguous which copy of the field I'm using, etc. Other things that cause me to hesitate: http://wiki.apache.org/solr/SchemaXml#Common_field_options The multiValued=true|false And this: http://wiki.apache.org/solr/FieldOptionsByUseCase multiValued is left blank in many cases, and not filled in for facets. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
Re: If you could have one feature in Solr...
Realtime search, hands down.
Re: If you could have one feature in Solr...
+1 I have several projects backburnered in the hope realtime search will come to solr soon... [m] On Feb 26, 2010, at 8:37 PM, Don Werve d...@madwombat.com wrote: Realtime search, hands down.
RE: If you could have one feature in Solr...
The indexer looking for an xml:lang attribute on text fields and using the value to pick, tokeniser, dictionaries, etc, etc automatically (and knowing to look for them in the standard places). cheers stuart
Re: Solr Cell and Deduplication - Get ID of doc
You could create your own unique ID and pass it in with the literal.field=value feature. http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters On Fri, Feb 26, 2010 at 7:56 AM, Bill Engle billengle...@gmail.com wrote: Any thoughts on this? I would like to get the id back in the request after indexing. My initial thoughts were to do a search to get the docid based on the attr_stream_name after indexing but now that I reread my message I mentioned the attr_stream_name (file_name) may be different so that is unreliable. My only option is to somehow return the id in the XML response. Any guidance is greatly appreciated. -Bill On Wed, Feb 24, 2010 at 12:06 PM, Bill Engle billengle...@gmail.com wrote: Hi - New Solr user here. I am using Solr Cell to index files (PDF, doc, docx, txt, htm, etc.) and there is a good chance that a new file will have duplicate content but not necessarily the same file name. To avoid this I am using the deduplication feature of Solr. updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldid/str bool name=overwriteDupestrue/bool str name=fieldsattr_content/str str name=signatureClassorg.apache.solr.update.processor./str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain How do I get the id value post Solr processing. Is there someway to modify the curl response so that id is returned. I need this id because I would like to rename the file to the id value. I could probably do a Solr search after the fact to get the id field based on the attr_stream_name but I would like to do only one request. curl ' http://localhost:8080/solr/update/extract?uprefix=attr_fmap.content=attr_contentcommit=true' -F myfi...@myfile.pdf Thanks, Bill -- Lance Norskog goks...@gmail.com
Re: If you could have one feature in Solr...
To have a coffee waiting for me every morning when I wake up. Marriage material indeed.
solr for reporting purposes
we are trying to use solr for somewhat of a reporting system too (along with search) .. since it provides such amazing control over queries and basically over the data that user wants .. they might as well be able to dump that data in an excel file too if needed .. our data isnt too much close to 25K docs with 15-20 fields in each doc .. and mostly these reports will be for close to 500 - 4000 records .. i am thinking about setting up a simple servlet that grabs all this data that submits the user query to solr over http .. grabs all that results data and dumps it in an excel file .. i was just hoping to get some idea of whether this is going to cause any performance impact on solr search .. especially since its all on the same server and some users will be doing reports while others will be searching .. right now search is working GREAT .. its blazing fast .. i dont wanna loose this but at the same time reporting is an important requirement as well .. also i would appreciate any hints towards some creative ways of doing it .. something like getting 500 some records in a single request and then using some timer task repeat the process .. thanks for ur help -- View this message in context: http://old.nabble.com/solr-for-reporting-purposes-tp27725967p27725967.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr for reporting purposes
I just want to clarify if its not obvious .. that the reason I am concerned about the performance of solr is becaues for reporting requests I will probably have to request all result rows at the same time .. instead of 10 or 20 adeelmahmood wrote: we are trying to use solr for somewhat of a reporting system too (along with search) .. since it provides such amazing control over queries and basically over the data that user wants .. they might as well be able to dump that data in an excel file too if needed .. our data isnt too much close to 25K docs with 15-20 fields in each doc .. and mostly these reports will be for close to 500 - 4000 records .. i am thinking about setting up a simple servlet that grabs all this data that submits the user query to solr over http .. grabs all that results data and dumps it in an excel file .. i was just hoping to get some idea of whether this is going to cause any performance impact on solr search .. especially since its all on the same server and some users will be doing reports while others will be searching .. right now search is working GREAT .. its blazing fast .. i dont wanna loose this but at the same time reporting is an important requirement as well .. also i would appreciate any hints towards some creative ways of doing it .. something like getting 500 some records in a single request and then using some timer task repeat the process .. thanks for ur help -- View this message in context: http://old.nabble.com/solr-for-reporting-purposes-tp27725967p27726016.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing using inbuilt lucene in Solr
Hi I'm new to Solr. I have a database which consists of latitude, longitude and relevant news. This file has been imported using dataimport. I think it has been indexed successfully by Solr. Now I have to move ahead and give few queries as mentioned below. # hsin (great circle): http://localhost:8983/solr/select/?q=name:Minneapolis AND _val_:recip(hsin(0.78, -1.6, lat_rad, lon_rad, 3963.205), 1, 1, 0)^100 # dist (Euclidean, Manhattan, p-norm): http://localhost:8983/solr/select/?q=name:Minneapolis AND _val_:recip(dist(2, lat, lon, 44.794, -93.2696), 1, 1, 0)^100 How do I get started with it? Should lucene be used explicitly to index the file again? Kindly help me to get started off with it. Thanks in advance. -- View this message in context: http://old.nabble.com/indexing-using-inbuilt-lucene-in-Solr-tp27726161p27726161.html Sent from the Solr - User mailing list archive at Nabble.com.