Re: White space in facet values
you should try fq=Product:Electric Guitar How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar? -- http://jetwick.com open twitter search
Different Results..
Hi All, i am getting different results when i used with some escape keys.. for example::: 1) when i use this request http://localhost:8080/solr/select?q=erlang!ericson the result obtained is result name=response numFound=1934 start=0 2) when the request is http://localhost:8080/solr/select?q=erlang/ericson the result is result name=response numFound=1 start=0 My query here is, do solr consider both the queries differently and what do it consider for !,/ and all other escape characters. Regards, satya
solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
Hi, Is there a way with faceting or field collapsing to do the SQL equivalent of SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria i.e. I'm only interested in the total count not the individual records and counts. Cheers, Dan
Re: Different Results..
We need more information about the the analyzers and tokenizers of the default field of your search Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/12/22 satya swaroop satya.yada...@gmail.com Hi All, i am getting different results when i used with some escape keys.. for example::: 1) when i use this request http://localhost:8080/solr/select?q=erlang!ericson the result obtained is result name=response numFound=1934 start=0 2) when the request is http://localhost:8080/solr/select?q=erlang/ericson the result is result name=response numFound=1 start=0 My query here is, do solr consider both the queries differently and what do it consider for !,/ and all other escape characters. Regards, satya
Re: White space in facet values
try to copy the values (with copyfield) to a string field Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/12/22 Peter Karich peat...@yahoo.de you should try fq=Product:Electric Guitar How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar? -- http://jetwick.com open twitter search
Item precedence search problem
Hi all, I am using solr in my web application for search purposes. However, i am having a problem with the default behaviour of the solr search. From my understanding, if i query for a keyword, let's say Laptop, preference is given to result rows having more occurences of the search keyword Laptop in the field name. This, however, is producing undesirable scenarios, for example: 1. I index an item A with name value Sony Laptop. 2. I index another item B with name value: Laptop bags for laptops. 3. I search for the keyword Laptop According to the default behaviour, precedence would be given to item B since the keyword appears more times in the name field for that item. Also we donot have anything in the catagory field with which we can catagorize. Can anyone suggest a better approach to sort potential search results? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Item-precedence-search-problem-tp2130419p2130419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
Have you investigated 'field collapsing'? I believe that it is a least the 'DISTINCT' part. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: dan sutton danbsut...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wed, December 22, 2010 1:29:23 AM Subject: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria Hi, Is there a way with faceting or field collapsing to do the SQL equivalent of SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria i.e. I'm only interested in the total count not the individual records and counts. Cheers, Dan
hole RAM using by solr during Optimize
Hello. I have a RAM problem during a optimize. When is start an delta or full import, solr using only this ram which i allocate to him. eg.: java -jar -Xmx2g start.jar when solr is fetching the rows from database the using of ram ist okay. But when solr begin to otimize, solr want all of the available ram ?!?!?!?!?!? why is it so. the used Ram jumpes into the sky and only 40 MB Ram is free, of 8 GB !!! how can i limit this ? -- View this message in context: http://lucene.472066.n3.nabble.com/hole-RAM-using-by-solr-during-Optimize-tp2130482p2130482.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Item precedence search problem
On Wed, Dec 22, 2010 at 3:09 PM, Hasnain hasn...@hotmail.com wrote: [...] From my understanding, if i query for a keyword, let's say Laptop, preference is given to result rows having more occurences of the search keyword Laptop in the field name. This, however, is producing undesirable scenarios, for example: 1. I index an item A with name value Sony Laptop. 2. I index another item B with name value: Laptop bags for laptops. 3. I search for the keyword Laptop According to the default behaviour, precedence would be given to item B since the keyword appears more times in the name field for that item. Your question is not clear. How would you like the precedence to work? If you want to ignore term frequency you can override the default similarity class with a custom class; changing the configuration to the new similarity class at the bottom of schema.xml. Also we donot have anything in the catagory field with which we can catagorize. [...] Sorry, what category field are you talking about? Is this something specific to your schema? Regards, Gora
Re: hole RAM using by solr during Optimize
maybe i set my chache in solrconfig.xml to high ? why now i see das the cache very high in server. -- View this message in context: http://lucene.472066.n3.nabble.com/hole-RAM-using-by-solr-during-Optimize-tp2130482p2130490.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Item precedence search problem
Hi, First of all thanks for replying. Secondly, maybe i wasn't clear enough in my original post regarding what was required and what has been implemented. In my schema, i have another field by the name of Category and, for example's sake, let's assume that my application supports only two categories: computers and accessories. Now, what i require is a mechanism to assign correct categories to the items during item indexing so that this field can be used to better filter the search results. Continuing from the example in my original post, item A would belong to Computer category and item B would belong to Accessories category. So then, searching for Laptop would only look for items in the Computers category and return item A only. I would like to point out here that setting the category field manually is not an option since the data might be in the vicinity of thousands of records. I am not asking for an in-depth algorithm. Just a high level design would be sufficient to set me in the right direction. thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Item-precedence-search-problem-tp2130419p2130593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
facets=truefacet.field=field // SELECT count(distinct(field)) fq=field:[* TO *] // WHERE length(field) 0 q=other_criteriaAfq=other_criteriaB// AND other_criteria advantage: you can look into several fields at one time when adding another facet.field disadvantage: you get the counts splitted by the values of that field fix this via field collapsing / results grouping http://wiki.apache.org/solr/FieldCollapsing or use deduplication: http://wiki.apache.org/solr/Deduplication Regards, Peter. Hi, Is there a way with faceting or field collapsing to do the SQL equivalent of SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria i.e. I'm only interested in the total count not the individual records and counts. Cheers, Dan -- http://jetwick.com open twitter search
Glob in fl parameter
Hi, Is there any support for glob in the 'fl' param. This would be very useful in case of retrieving dynamic fields. I have read the wiki for FieldAliasesAndGlobsInParams. Is there any related patch? Thanks for any pointers, Samarth
Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo
Hello Erick, Could you kindly give a hand on my problem. Any ideas, hints, suggestions are highly appreciated. Many thanks 1. The problem: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2. Some other info.: - use the solr example 1.4.1 - Geronimo 2.1.6 - solr home: /opt/dev/config/solr - dataDir: /opt/dev/config/solr/data/index. I set the read, write right to every and each folder, from opt, dev...to the last one, index (just for sure ;) ) - lockType: - single/ simple: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) - native: Cannot create directory: /solr/data/index at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock - the Geronimo log: === 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:07,941 INFO [DirectoryMonitor] Hot deployer notified that an artifact was removed: default/solr2/1293005281314/war 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:14,139 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/plugin.Deployment!227983155|0' 2010-12-22 15:13:18,795 WARN [TomcatModuleBuilder] Web application . does not contain a WEB-INF/geronimo-web.xml deployment plan. This may or may not be a problem, depending on whether you have things like resource references that need to be resolved. You can also give the deployer a separate deployment plan file on the command line. 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Solr home set to '/opt/dev/config/solr/' 2010-12-22 15:13:19,051 INFO [SolrDispatchFilter] SolrDispatchFilter.init() 2010-12-22 15:13:19,462 INFO [IndexSchema] default search field is text 2010-12-22 15:13:19,463 INFO [IndexSchema] query parser default operator is OR 2010-12-22 15:13:19,464 INFO [IndexSchema] unique key field: id 2010-12-22 15:13:19,490 INFO [JmxMonitoredMap] JMX monitoring is enabled. Adding Solr mbeans to JMX Server: com.sun.jmx.mbeanserver.jmxmbeanser...@144752d 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[]} 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[{q=solr rocks,start=0,rows=10}, {q=static firstSearcher warming query from solrconfig.xml}]} 2010-12-22 15:13:19,533 WARN [SolrCore] Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.IOException: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) at org.apache.solr.core.SolrCore.init(SolrCore.java:545) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) ... 2010-12-22 15:13:19,601 INFO [SolrDispatchFilter] SolrDispatchFilter.init() done 2010-12-22 15:13:19,601 INFO [SolrServlet] SolrServlet.init() 2010-12-22 15:13:19,602 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,602 INFO [SolrServlet] SolrServlet.init() done 2010-12-22 15:13:19,606 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,606 INFO [SolrUpdateServlet] SolrUpdateServlet.init() done 2010-12-22 15:13:19,721 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/plugin.Deployment!227983155|0' === With regards, Bac Hoang
Crontab for delta-import
I want to run delta-import in Crontab but don't know how. I used php file in Crontab before, like: command: php /home/user/public_html/auto.php I tried: command: /home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import It didn't work. The url I run in browser is: http://181.163.64.228:8983/solr/db/dataimport?command=delta-import Thanks Richard
Re: Crontab for delta-import
Hi, you can use wget if available on your server, e.g. command wget --quiet 'http://181.163.64.228:8983/solr/db/dataimport?command=delta-import' Cheers, Stefan Am 22.12.2010 12:31, schrieb Ruixiang Zhang: I want to run delta-import in Crontab but don't know how. I used php file in Crontab before, like: command: php /home/user/public_html/auto.php I tried: command: /home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import It didn't work. The url I run in browser is: http://181.163.64.228:8983/solr/db/dataimport?command=delta-import Thanks Richard -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
Re: Crontab for delta-import
Thanks for your quick reply. I couldn't find the wget on my server. Do you know where it should be located or how I can check if I have it on my server? If not, can I install one? Thanks On Wed, Dec 22, 2010 at 3:38 AM, Stefan Moises moi...@shoptimax.de wrote: Hi, you can use wget if available on your server, e.g. command wget --quiet ' http://181.163.64.228:8983/solr/db/dataimport?command=delta-import' Cheers, Stefan Am 22.12.2010 12:31, schrieb Ruixiang Zhang: I want to run delta-import in Crontab but don't know how. I used php file in Crontab before, like: command: php /home/user/public_html/auto.php I tried: command: /home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import It didn't work. The url I run in browser is: http://181.163.64.228:8983/solr/db/dataimport?command=delta-import Thanks Richard -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
Re: Crontab for delta-import
Just call wget http://www.somedomain.com on the console to see if it is available... Depends on your distro where it is installed and how to install it... I have mine in /usr/bin/wget Alternatively, use lynx or curl as command, e.g. curl --silent 'http://181.163.64.228:8983/solr/db/dataimport?command=delta-import' Cheers, Stefan Am 22.12.2010 12:46, schrieb Ruixiang Zhang: Thanks for your quick reply. I couldn't find the wget on my server. Do you know where it should be located or how I can check if I have it on my server? If not, can I install one? Thanks On Wed, Dec 22, 2010 at 3:38 AM, Stefan Moises moi...@shoptimax.de mailto:moi...@shoptimax.de wrote: Hi, you can use wget if available on your server, e.g. command wget --quiet 'http://181.163.64.228:8983/solr/db/dataimport?command=delta-import' Cheers, Stefan Am 22.12.2010 12:31, schrieb Ruixiang Zhang: I want to run delta-import in Crontab but don't know how. I used php file in Crontab before, like: command: php /home/user/public_html/auto.php I tried: command: /home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import It didn't work. The url I run in browser is: http://181.163.64.228:8983/solr/db/dataimport?command=delta-import Thanks Richard -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de mailto:moi...@shoptimax.de http://www.shoptimax.de *** -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
Re: Query performance issue while using EdgeNGram
1) Thanks for this update. I have to use 'WhiteSpaceTokenizer' 2) I have to suggest the whole query itself (Say name or title) 3) Could you please let me know if there is a way to find the evicted docs? 4) Yes, we are seeing improvement in the response time if we optimize. But still for some queries QTime is more than 8 secs. It is a 'Blocker' for us. Could you please suggest any to reduce the QTime to 1 secs. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-performance-issue-while-using-EdgeNGram-tp2097056p2130751.html Sent from the Solr - User mailing list archive at Nabble.com.
ZkSolrResourceLoader does not support getConfigDir
Hi, I got small problem with DIH for SolrCloud. I have specified my dataSource settings in the seperated file: data-config.xml in the conf folder (same folder where schema.xml and solrconfig are placed). When I try importing my data from DB table for indexing I receive the following problem: ZkSolrResourceLoader does not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper mode org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader does not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper mode at org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:97) at org.apache.solr.handler.dataimport.DataImportHandler.getSolrWriter(DataImportHandler.java:282) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:198) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1329) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Regards, Joanna
Re: solrj http client 4
Tried to checkout lucene/solr and setup projects and classpath in eclipse - there seems to be circular dependency between modules - this is not possible/allowed in maven built project, would require refactoring. Regards, Stevo. On Wed, Dec 8, 2010 at 1:42 PM, Stevo Slavić ssla...@gmail.com wrote: OK, thanks. Can't promise anything, but would love to contribute. First impression on the source code - ant is used as build tool, wish it was maven. If it was maven then https://issues.apache.org/jira/browse/SOLR-1218 would be trivial or wouldn't exist in the first place. Regards, Stevo. On Wed, Dec 8, 2010 at 10:25 AM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: SOLR-2020 addresses upgrading to HttpComponents (form HttpClient). I have had no time to work more on it, yet, though. I also don't have that much experience with the new version, so any help is much appreciated. Cheers, Chantal On Tue, 2010-12-07 at 18:35 +0100, Yonik Seeley wrote: On Tue, Dec 7, 2010 at 12:32 PM, Stevo Slavić ssla...@gmail.com wrote: Hello solr users and developers, Are there any plans to upgraded http client dependency in solrj from 3.x to 4.x? I'd certainly be for moving to 4.x (and I think everyone else would too). The issue is that it's not a drop-in replacement, so someone needs to do the work. -Yonik http://www.lucidimagination.com Found this https://issues.apache.org/jira/browse/SOLR-861 ticket - judging by comments in it upgrade might help fix the issue. I have a project in jar hell, getting different versions of http client as transitive dependency... Regards, Stevo.
RE: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
This won't actually give you the number of distinct facet values, but will give you the number of documents matching your conditions. It's more equivalent to SQL without the distinct. There is no way in Solr 1.4 to get the number of distinct facet values. I am not sure about the new features in trunk. From: Peter Karich [peat...@yahoo.de] Sent: Wednesday, December 22, 2010 6:10 AM To: solr-user@lucene.apache.org Subject: Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria facets=truefacet.field=field // SELECT count(distinct(field)) fq=field:[* TO *] // WHERE length(field) 0 q=other_criteriaAfq=other_criteriaB// AND other_criteria advantage: you can look into several fields at one time when adding another facet.field disadvantage: you get the counts splitted by the values of that field fix this via field collapsing / results grouping http://wiki.apache.org/solr/FieldCollapsing or use deduplication: http://wiki.apache.org/solr/Deduplication Regards, Peter. Hi, Is there a way with faceting or field collapsing to do the SQL equivalent of SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria i.e. I'm only interested in the total count not the individual records and counts. Cheers, Dan -- http://jetwick.com open twitter search
Re: Transparent redundancy in Solr
Well, SolrCloud is not yet fully specified for the indexing side - more work remains. But my point is that the architecture for should be ZK based. I added a new jira issue to flesh out a strategy for SolrCloud controlled distributed indexing in SOLR-2293 Perhaps you should open a JIRA issue for indexer failover as well. The simplest model would be to promote one of the search slaves to master indexer, as each slave will have an (almost up-to-date) copy of the index. The client should then have a means of getting alerted about the failover and from what timestamp he will need to re-feed content (based on slave index date). In my opinion it is extremely hard to try to solve some kind of always-in-sync instant failover, and most will not need it either. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 19. des. 2010, at 19.29, Upayavira wrote: Jan, I'd appreciate a little more explanation here. I've explored SolrCloud somewhat, but there's some bits of this architecture I don't yet get. You say, next time an indexer slave pings ZK. What is an indexer slave? Is that the external entity that is doing posting indexing content? If this app that posts to Solr, you imply it must check with ZK before it can do an HTTP post to Solr? Also, once you do this leader election to switch to an alternative master, are you implying that this new master was once a slave of the original master, and thus has a valid index? Find this interesting, but still not quite sure on how it works exactly. Upayavira On Fri, 17 Dec 2010 10:09 +0100, Jan Høydahl / Cominvent jan@cominvent.com wrote: Hi, I believe the way to go is through ZooKeeper[1], not property files or local hacks. We've already started on this route and it makes sense to let ZK do what it is designed for, such as leader election. When a node starts up, it asks ZK what role it should have and fetches corresponding configuration. Then it polls ZK regularly to know if the world has changed. So if a master indexer goes down, ZK will register that as a state change condition, and next time one of the indexer slaves pings ZK, it may be elected as new master, and config in ZK is changed correspondingly, causing all adds to flow to the new master... Then, when the slaves cannot contact their old master, they ask ZK for an update, and retrieve a new value for master URL. Note also that SolrCloud is implementing load-balancing and sharding as part of the arcitecture so often we can skip dedicated LBs. [1] : http://wiki.apache.org/solr/SolrCloud -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 15. des. 2010, at 18.50, Tommaso Teofili wrote: Hi all, me, Upayavira and other guys at Sourcesense have collected some Solr architectural views inside the presentation at [1]. For sure one can set up an architecture for failover and resiliency on the search face (search slaves with coordinators and distributed search) but I'd like to ask how would you reach transparent redundancy in Solr on the index face. On slide 13 we put 2 slave backup masters and so if one of the main masters goes down you can switch slaves' replication on the backup master. First question if how could it be made automatic? In a previous thread [2] I talked about a possible solution writing the master url of slaves in a properties file so when you have to switch you change that url to the backup master and reload the slave's core but that is not automatic :-) Any more advanced ideas? Second question: when main master comes up how can it be automatically considered as the backup master (since hopefully the backup master has received some indexing requests in the meantime)? Also consider that its index should be wiped out and replicated from the new master to ensure index integrity. Looking forward for your feedback, Cheers, Tommaso [1] : http://www.slideshare.net/sourcesense/sharded-solr-setup-with-master [2] : http://markmail.org/thread/vjj5jovbg6evpmpp
RE: White space in facet values
The phrase solution works as does escaping the space with a backslash: fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped like this (amperstands and parenthesis come to mind)... I assume you already have this indexed as string, not text... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Andy [mailto:angelf...@yahoo.com] Sent: Wednesday, December 22, 2010 1:11 AM To: solr-user@lucene.apache.org Subject: White space in facet values How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?
XInclude in multi core
Hi, In a test set up i have a master and slave in the same JVM but different cores. Of course i'd like to replicate configuration files and include some via XInclude. The problem is the href path; it's can't use properties and is relative to the servlet container. Here's the problem, i also replicate the solrconfig.xml so a include solr/corename/conf/file.xml will not work in the cores i replicate it to and i can't embed some corename property in the href to make it generic. Anyone knows a trick here? Thanks! Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: White space in facet values
On Wed, Dec 22, 2010 at 9:53 AM, Dyer, James james.d...@ingrambook.com wrote: The phrase solution works as does escaping the space with a backslash: fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped like this (amperstands and parenthesis come to mind)... One way to avoid escaping is to use the raw or term query parsers: fq={!raw f=Product}Electric Guitar In 4.0-dev, use {!term} since that will work with field types that need to transform the external representation into the internal one (like numeric fields need to do). http://wiki.apache.org/solr/SolrQuerySyntax -Yonik http://www.lucidimagination.com I assume you already have this indexed as string, not text... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Andy [mailto:angelf...@yahoo.com] Sent: Wednesday, December 22, 2010 1:11 AM To: solr-user@lucene.apache.org Subject: White space in facet values How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?
Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo
What you want to ask? When this problem arises.? Is it when you try to index to solr? What are the commands that you are running? Which version of solr( 1.4.1?). On Wed, Dec 22, 2010 at 5:49 PM, Bac Hoang [via Lucene] ml-node+2130906-265633473-146...@n3.nabble.comml-node%2b2130906-265633473-146...@n3.nabble.com wrote: Hello Erick, Could you kindly give a hand on my problem. Any ideas, hints, suggestions are highly appreciated. Many thanks 1. The problem: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2. Some other info.: - use the solr example 1.4.1 - Geronimo 2.1.6 - solr home: /opt/dev/config/solr - dataDir: /opt/dev/config/solr/data/index. I set the read, write right to every and each folder, from opt, dev...to the last one, index (just for sure ;) ) - lockType: - single/ simple: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) - native: Cannot create directory: /solr/data/index at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock - the Geronimo log: === 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:07,941 INFO [DirectoryMonitor] Hot deployer notified that an artifact was removed: default/solr2/1293005281314/war 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:14,139 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/plugin.Deployment!227983155|0' 2010-12-22 15:13:18,795 WARN [TomcatModuleBuilder] Web application . does not contain a WEB-INF/geronimo-web.xml deployment plan. This may or may not be a problem, depending on whether you have things like resource references that need to be resolved. You can also give the deployer a separate deployment plan file on the command line. 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Solr home set to '/opt/dev/config/solr/' 2010-12-22 15:13:19,051 INFO [SolrDispatchFilter] SolrDispatchFilter.init() 2010-12-22 15:13:19,462 INFO [IndexSchema] default search field is text 2010-12-22 15:13:19,463 INFO [IndexSchema] query parser default operator is OR 2010-12-22 15:13:19,464 INFO [IndexSchema] unique key field: id 2010-12-22 15:13:19,490 INFO [JmxMonitoredMap] JMX monitoring is enabled. Adding Solr mbeans to JMX Server: com.sun.jmx.mbeanserver.jmxmbeanser...@144752d 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[]} 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[{q=solr rocks,start=0,rows=10}, {q=static firstSearcher warming query from solrconfig.xml}]} 2010-12-22 15:13:19,533 WARN [SolrCore] Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.IOException: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) at org.apache.solr.core.SolrCore.init(SolrCore.java:545) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) ... 2010-12-22 15:13:19,601 INFO [SolrDispatchFilter] SolrDispatchFilter.init() done 2010-12-22 15:13:19,601 INFO [SolrServlet] SolrServlet.init() 2010-12-22 15:13:19,602 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,602 INFO [SolrServlet] SolrServlet.init() done 2010-12-22 15:13:19,606 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,606 INFO [SolrUpdateServlet] SolrUpdateServlet.init() done 2010-12-22 15:13:19,721 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/plugin.Deployment!227983155|0' === With regards, Bac Hoang -- View message @ http://lucene.472066.n3.nabble.com/Dismax-score-maximu-of-any-one-field-tp2119563p2130906.html To start a new topic under Solr - User, email ml-node+472068-1941297125-146...@n3.nabble.comml-node%2b472068-1941297125-146...@n3.nabble.com To unsubscribe from Solr - User, click
edismax inconsistency -- AND/OR
I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser). I'm experiencing inconsistent behavior with terms grouped in parenthesis. Sometimes they are AND'ed and sometimes OR'ed together. 1. q=Title:(life)defType=edismax 285 results 2. q=Title:(hope)defType=edismax 34 results 3. q=Title:(life AND hope)defType=edismax 1 result 4. q=Title:(life OR hope)defType=edismax 318 results 5. q=Title:(life hope)defType=edismax 1 result (life, hope are being AND'ed together) 6. q=Title:(life AND hope) AND Title:(life)defType=edismax 1 result 7. q=Title:(life OR hope) AND Title:(life)defType=edismax 285 result 8. q=Title:(life hope) AND Title:(life)defType=edismax 285 results (life, hope are being OR'ed together) See how in #5, the two terms get AND'ed, but by adding the additional (nonsense) clause in #8, the first two terms get OR'ed . Is this a feature or a bug? Am I likely doing something wrong? I've tried this both with ...defaultOperator=AND... and ...defaultOperator=OR... I've also tried the two settings with q.op. It seems as if edismax doesn't use these at all. When using the default query parser, I get consistent AND/OR logic as expected. That is, if the defaultOperator (or q.op if specified) is always consistently applied. As a workaround, I think I can just always insert the operator (as in examples 6 7). However, this is an extra burden on our clients that I'd like to avoid if at all possible. See below for more configuration information. Any ideas are appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 Snippets from schema.xml: fieldType name=textStemmed class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType ... field name=Title type=textStemmed indexed=true stored=true multiValued=false omitNorms=true omitTermFreqAndPositions=false / ... solrQueryParser defaultOperator=AND/
Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo
On Wed, Dec 22, 2010 at 4:55 PM, Bac Hoang bac.ho...@axonactive.vn wrote: Hello Erick, Could you kindly give a hand on my problem. Any ideas, hints, suggestions are highly appreciated. Many thanks 1. The problem: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2. Some other info.: - use the solr example 1.4.1 - Geronimo 2.1.6 - solr home: /opt/dev/config/solr - dataDir: /opt/dev/config/solr/data/index. [...] Shouldn't the dataDir be /opt/dev/config/solr/data? Alternatively, try removing /opt/dev/config/solr/data (please first make sure that you have no critical data there), and restarting Solr. If dataDir is missing, Solr should create it. Regards, Gora
Solr Spellcheker automatically tokenizes on period marks
Hello, My main (full text) index contains the terms www, sometest, com, which is intended and correct. My spellcheck index contains the term www.sometest.com. which is also intended and correct. However, when querying the spellchecker using the query www.sometest.com, I get the suggestion www.www.sometest.com.com, despite the fact that I'm not using a tokenizer that splits on . (period marks) as part of my spellcheck query analyzer. When running the Field Analyzer (in the Solr admin page), I can see that even after the last filter (see below), my term text remains www.sometest.com, which is untokenized, as expected. Any thoughts as to what may be causing this undesired tokenization? To summarize: Main index contains: www, sometest, com Spellcheck index contains: www.sometest.com Spellcheck query: www.sometest.com Expected result: (no suggestion) Actual result: www.www.sometest.com.com Here is my spellcheck query analyzer: analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Thank you in advance; any suggestions are welcome! Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spellcheker-automatically-tokenizes-on-period-marks-tp2131844p2131844.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr query to get results based on the word length (letter count)
Hi, I have a solar index that has thousands of records, the title is one of the solar fields, and I would like to query for title values that are less than 50 characters long. Is there a way to construct the Solr query to provide results based on the character length? thank you very much!
Re: Solr Spellcheker automatically tokenizes on period marks
Check the analyzer of the field you defined for queryAnalyzerFieldType which is configured in the search component. On Wednesday 22 December 2010 16:32:18 Sebastian M wrote: Hello, My main (full text) index contains the terms www, sometest, com, which is intended and correct. My spellcheck index contains the term www.sometest.com. which is also intended and correct. However, when querying the spellchecker using the query www.sometest.com, I get the suggestion www.www.sometest.com.com, despite the fact that I'm not using a tokenizer that splits on . (period marks) as part of my spellcheck query analyzer. When running the Field Analyzer (in the Solr admin page), I can see that even after the last filter (see below), my term text remains www.sometest.com, which is untokenized, as expected. Any thoughts as to what may be causing this undesired tokenization? To summarize: Main index contains: www, sometest, com Spellcheck index contains: www.sometest.com Spellcheck query: www.sometest.com Expected result: (no suggestion) Actual result: www.www.sometest.com.com Here is my spellcheck query analyzer: analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Thank you in advance; any suggestions are welcome! Sebastian -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
AW: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo
Hello Anurag, The specific problem I faced when started solr in Geronimo (http://{server}:{port}/solr) is /solr/data/index could not be found, then solr tried to create that folder but failed, even permission is granted. More detail got from the log: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.IOException: Cannot create directory: /solr/data/index You're right, I'm using solr is 1.4.1 Thanks indeed Bac Hoang -Ursprüngliche Nachricht- Von: Anurag [mailto:anurag.it.jo...@gmail.com] Gesendet: tư 12/22/2010 10:17 CH An: solr-user@lucene.apache.org Betreff: Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo What you want to ask? When this problem arises.? Is it when you try to index to solr? What are the commands that you are running? Which version of solr( 1.4.1?). On Wed, Dec 22, 2010 at 5:49 PM, Bac Hoang [via Lucene] ml-node+2130906-265633473-146...@n3.nabble.comml-node%2b2130906-265633473-146...@n3.nabble.com wrote: Hello Erick, Could you kindly give a hand on my problem. Any ideas, hints, suggestions are highly appreciated. Many thanks 1. The problem: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2. Some other info.: - use the solr example 1.4.1 - Geronimo 2.1.6 - solr home: /opt/dev/config/solr - dataDir: /opt/dev/config/solr/data/index. I set the read, write right to every and each folder, from opt, dev...to the last one, index (just for sure ;) ) - lockType: - single/ simple: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) - native: Cannot create directory: /solr/data/index at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock - the Geronimo log: === 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:07,941 INFO [DirectoryMonitor] Hot deployer notified that an artifact was removed: default/solr2/1293005281314/war 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:14,139 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/plugin.Deployment!227983155|0' 2010-12-22 15:13:18,795 WARN [TomcatModuleBuilder] Web application . does not contain a WEB-INF/geronimo-web.xml deployment plan. This may or may not be a problem, depending on whether you have things like resource references that need to be resolved. You can also give the deployer a separate deployment plan file on the command line. 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Solr home set to '/opt/dev/config/solr/' 2010-12-22 15:13:19,051 INFO [SolrDispatchFilter] SolrDispatchFilter.init() 2010-12-22 15:13:19,462 INFO [IndexSchema] default search field is text 2010-12-22 15:13:19,463 INFO [IndexSchema] query parser default operator is OR 2010-12-22 15:13:19,464 INFO [IndexSchema] unique key field: id 2010-12-22 15:13:19,490 INFO [JmxMonitoredMap] JMX monitoring is enabled. Adding Solr mbeans to JMX Server: com.sun.jmx.mbeanserver.jmxmbeanser...@144752d 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[]} 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[{q=solr rocks,start=0,rows=10}, {q=static firstSearcher warming query from solrconfig.xml}]} 2010-12-22 15:13:19,533 WARN [SolrCore] Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.IOException: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) at org.apache.solr.core.SolrCore.init(SolrCore.java:545) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) ... 2010-12-22 15:13:19,601 INFO [SolrDispatchFilter] SolrDispatchFilter.init() done 2010-12-22 15:13:19,601 INFO [SolrServlet] SolrServlet.init() 2010-12-22 15:13:19,602 INFO [SolrResourceLoader] Using JNDI solr.home:
Re: Duplicate values in multiValued field
In my experience, that should work fine. Facetting in 1.4 works fine on multi-valued fields, and a duplicate value in the multi-valued field shouldn't be a problem. On 12/22/2010 2:31 AM, Andy wrote: If I put duplicate values into a multiValued field, would that cause any issues? For example I have a multiValued field Color. Some of my documents have duplicate values for that field, such as: Green, Red, Blue, Green, Green. Would the above (having 3 duplicate Green) be the same as having the duplicated values of: Green, Red, Blue? Or do I need to clean my data and remove duplicate values before indexing? Thanks.
Re: Solr query to get results based on the word length (letter count)
On Wed, Dec 22, 2010 at 9:06 PM, Giri giriprak...@gmail.com wrote: Hi, I have a solar index that has thousands of records, the title is one of the solar fields, and I would like to query for title values that are less than 50 characters long. Is there a way to construct the Solr query to provide results based on the character length? [...] One could write a custom query parser, but if one needed that, would it not be easier to simply index the length of the title value as a separate field? Regards, Gora
Re: Solr Spellcheker automatically tokenizes on period marks
Hi and thanks for your reply, My searchComponent is as such: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str ... /searchComponent And then in my schema.xml, I have: fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer ... /fieldType Which is the analyzer I pasted in my original post. So this only confirms that the query term is going through these filters and tokenizer, but none of them splits on period marks. Do you see any possible problems with my setup? Thanks! Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spellcheker-automatically-tokenizes-on-period-marks-tp2131844p2131959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: White space in facet values
Another technique, which works great for facet fq's and avoids the need to worry about escaping, is using the field query parser instead: fq={!field f=Product}Electric Guitar Using the field query parser avoids the need for ANY escaping of your value at all, which is convenient in the facetting case -- you still need to URI-escape (ampersands for instance), but you shouldn't need to escape any Solr special characters like parens or double quotes or anything else, if you've made your string suitable for including in a URI. With the field query parser, a lot less to worry about. http://lucene.apache.org/solr/api/org/apache/solr/search/FieldQParserPlugin.html On 12/22/2010 9:53 AM, Dyer, James wrote: The phrase solution works as does escaping the space with a backslash: fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped like this (amperstands and parenthesis come to mind)... I assume you already have this indexed as string, not text... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Andy [mailto:angelf...@yahoo.com] Sent: Wednesday, December 22, 2010 1:11 AM To: solr-user@lucene.apache.org Subject: White space in facet values How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?
Re: White space in facet values
Huh, does !term in 4.0 mean the same thing as !field in 1.4? What you describe as !term in 4.0 dev is what I understand as !field in 1.4 doing. On 12/22/2010 10:01 AM, Yonik Seeley wrote: On Wed, Dec 22, 2010 at 9:53 AM, Dyer, Jamesjames.d...@ingrambook.com wrote: The phrase solution works as does escaping the space with a backslash: fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped like this (amperstands and parenthesis come to mind)... One way to avoid escaping is to use the raw or term query parsers: fq={!raw f=Product}Electric Guitar In 4.0-dev, use {!term} since that will work with field types that need to transform the external representation into the internal one (like numeric fields need to do). http://wiki.apache.org/solr/SolrQuerySyntax -Yonik http://www.lucidimagination.com I assume you already have this indexed as string, not text... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Andy [mailto:angelf...@yahoo.com] Sent: Wednesday, December 22, 2010 1:11 AM To: solr-user@lucene.apache.org Subject: White space in facet values How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?
Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
On Dec 22, 2010, at 09:21 , Jonathan Rochkind wrote: This won't actually give you the number of distinct facet values, but will give you the number of documents matching your conditions. It's more equivalent to SQL without the distinct. There is no way in Solr 1.4 to get the number of distinct facet values. That's not true - the total number of facet values is the distinct number of values in that field. You need to be sure you have facet.limit=-1 (default is 100) to see all values in the response rather than just a page of them though. Erik
Using two request handlers in the same query...
I have two request handlers set up something like this: requestHandler name=Keyword_SI class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str float name=tie0.01/float str name=qfTitle^130 Features^110 Edition^100 CTBR_SEARCH^90 THEM_SEARCH^80 BSAC_SEARCH1^70/str str name=q.alt*:*/str /lst /requestHandler requestHandler name=Title_SI class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str float name=tie0.01/float str name=qfTitle^100 Edition^10 Series^1/str str name=q.alt*:*/str /lst /requestHandler Is there any way to use both of these handlers for different parts of the query? I have a case where a user can search by Title, then later search within their results by keyword. I was trying to see if I could do this with local params, but it doesn't seem that you can specify a qt= like this: q={!qt=Title_SI}life If this had worked (but it didn't), I was hoping I could solve my problem like this: qt=Title_SIq=(life) AND ( _query_:{!qt=Keyword_SI}faith) , using the technique found at http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ Is there any way to do this? I'm using version 1.4.1 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311
Configuration option for disableReplication
Hi, I am looking into using a multi core configuration to allow us to fully rebuild our index while still applying updates. I have two cores main-core and rebuild-core. I push the whole dataset into the rebuild core, during which time I can happily keep pushing updates into the main-core. Once the rebuild is complete I swap the cores and delete *:* from the rebuild core. This works fine however there are a couple of edge cases: On server restart solr needs to remember which core has been swapped in to be the main core, this can be solved by adding the persistent=true attribute to the solr config, however this does require the solr.xml to be writeable. While deploying a new version of our application we overwrite the solr.xml, as the new version could potentially have legitimate changes to the solr.xml that need to be rolled out, again leaving the cores out of sync. My proposed solution is to have the indexing process do some sanity checking at the start of each run, and swap in the correct core if necessary. This works however there is the potential for the slaves to start replicating the empty index before the correct index is swapped in. To get round this problem I would like to have replication disabled on start up. Removing replicateAfter=startup has this affect but it would be more future proof to be able to specify a default for the replicationEnabled field (see SOLR-1175) in the ReplcationHandler, stopping replication until I explicitly turn it on. The change looks fairly simple. What do you think? Francis Please consider the environment before printing this email. -- Visit guardian.co.uk - newspaper website of the year www.guardian.co.uk www.observer.co.uk To save up to 33% when you subscribe to the Guardian and the Observer visit http://www.guardian.co.uk/subscriber - This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News Media Limited A member of Guardian Media Group plc Registered Office PO Box 68164 Kings Place 90 York Way London N1P 2AP Registered in England Number 908396
RE: solrj http client 4
Stevo, You may be interested in LUCENE-2657 https://issues.apache.org/jira/browse/LUCENE-2657, which provides full POMs for Lucene/Solr trunk. I don't use Eclipse, but I think it can use POMs to bootstrap project configuration. (I know IntelliJ can do this.) Steve -Original Message- From: Stevo Slavić [mailto:ssla...@gmail.com] Sent: Wednesday, December 22, 2010 9:17 AM To: solr-user@lucene.apache.org Subject: Re: solrj http client 4 Tried to checkout lucene/solr and setup projects and classpath in eclipse - there seems to be circular dependency between modules - this is not possible/allowed in maven built project, would require refactoring. Regards, Stevo. On Wed, Dec 8, 2010 at 1:42 PM, Stevo Slavić ssla...@gmail.com wrote: OK, thanks. Can't promise anything, but would love to contribute. First impression on the source code - ant is used as build tool, wish it was maven. If it was maven then https://issues.apache.org/jira/browse/SOLR-1218 would be trivial or wouldn't exist in the first place. Regards, Stevo. On Wed, Dec 8, 2010 at 10:25 AM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: SOLR-2020 addresses upgrading to HttpComponents (form HttpClient). I have had no time to work more on it, yet, though. I also don't have that much experience with the new version, so any help is much appreciated. Cheers, Chantal On Tue, 2010-12-07 at 18:35 +0100, Yonik Seeley wrote: On Tue, Dec 7, 2010 at 12:32 PM, Stevo Slavić ssla...@gmail.com wrote: Hello solr users and developers, Are there any plans to upgraded http client dependency in solrj from 3.x to 4.x? I'd certainly be for moving to 4.x (and I think everyone else would too). The issue is that it's not a drop-in replacement, so someone needs to do the work. -Yonik http://www.lucidimagination.com Found this https://issues.apache.org/jira/browse/SOLR-861 ticket - judging by comments in it upgrade might help fix the issue. I have a project in jar hell, getting different versions of http client as transitive dependency... Regards, Stevo.
Re: Solr query to get results based on the word length (letter count)
No good way. At indexing time, I'd just store the number of chars in the title in a field of it's own. You can possibly do that solely in schema.xml with clever use of analyzers and copyField. Solr isn't an rdbms. Best to de-normalize at index time so what you're going to want to query is in the index. On 12/22/2010 10:36 AM, Giri wrote: Hi, I have a solar index that has thousands of records, the title is one of the solar fields, and I would like to query for title values that are less than 50 characters long. Is there a way to construct the Solr query to provide results based on the character length? thank you very much!
Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
Well, that's true -- you can get the total number of facet values if you ALSO are willing to get back every facet value in the response. If you've got a hundred thousand or so unique facet values, and what you really want is just the _count_ without ALSO getting back a very large response (and waiting for Solr to construct the very large response), then you're out of luck. But if you're willing to get back all the values in the response too, that'll work, true. On 12/22/2010 11:23 AM, Erik Hatcher wrote: On Dec 22, 2010, at 09:21 , Jonathan Rochkind wrote: This won't actually give you the number of distinct facet values, but will give you the number of documents matching your conditions. It's more equivalent to SQL without the distinct. There is no way in Solr 1.4 to get the number of distinct facet values. That's not true - the total number of facet values is the distinct number of values in that field. You need to be sure you have facet.limit=-1 (default is 100) to see all values in the response rather than just a page of them though. Erik
Re: full text search in multiple fields
Hi guys, There's one more thing to get this code to work as I need I just found out... Im now using: q=title_search:hort*defType=lucene as iorixxx suggested. it works good BUT, this query doesnt find results if the title in DB is Hortus supremus I tried adding some tokenizers and filters to solve this, what I think is a casing issue, but no luck... below is my code...what am I missing here? Thanks again! fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=title type=text_ws indexed=true stored=true/ field name=title_search type=text indexed=true stored=true/ copyField source=title dest=title_search/ -- View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2132659.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: White space in facet values
: Huh, does !term in 4.0 mean the same thing as !field in 1.4? What you : describe as !term in 4.0 dev is what I understand as !field in 1.4 doing. There is a subtle distinction between {!field}, {!raw}, and {!term} which i attempted to explain on slides 26 and 43 in this presentation... http://people.apache.org/~hossman/apachecon2010/facets/ (you can use the HTML controls or print preview to view the notes i had when giving it) The nutshell explanation... when building filter queries from facet constraint values: * {!field} works in a lot of situations, but if you are using an analyzer on your facet field, there are some edge cases were it won't do what you expect. * {!raw} is truely raw terms, which works in almost all cases where you are likely using facet.field -- but it's too raw for some field types that use binary term values (like Trie) * {!term} does exactly what you would expect/want in all cases when your input is a facet constraint. it builts a term query from the human readable string representation (even if the internal representation is binary) -Hoss
Re: Duplicate values in multiValued field
: If I put duplicate values into a multiValued field, would that cause any issues? : : For example I have a multiValued field Color. Some of my documents : have duplicate values for that field, such as: Green, Red, Blue, Green, : Green. : : Would the above (having 3 duplicate Green) be the same as having the : duplicated values of: Green, Red, Blue? they won't be exdactly the same: the doc with dup vlaues will have a higher length, so it's lengthNorm will be lower; and it will have a higher term frequency for the terms that ar duplicated. in short, those documents won't score the same when searching the Color filed for any color. -Hoss
Re: full text search in multiple fields
Did you reindex after you changed your analyzers? On 12/22/2010 12:57 PM, PeterKerk wrote: Hi guys, There's one more thing to get this code to work as I need I just found out... Im now using:q=title_search:hort*defType=lucene as iorixxx suggested. it works good BUT, this query doesnt find results if the title in DB is Hortus supremus I tried adding some tokenizers and filters to solve this, what I think is a casing issue, but no luck... below is my code...what am I missing here? Thanks again! fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=title type=text_ws indexed=true stored=true/ field name=title_search type=text indexed=true stored=true/ copyField source=title dest=title_search/
Re: Recap on derived objects in Solr Index, 'schema in a can'
No, one cannot ignore the schema. If you try to add a field not in the schema you get an error. One could, however, use any arbitrary subset of the fields defined in the schema for any particular #document# in the index. Say your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one doc, and fields f6-f10 in another and f1, f4, f9 in another and. The only field(s) that #must# be in a document are the required=true fields. There's no real penalty for omitting fields from particular documents. This allows you to store special documents that aren't part of normal searches. You could, for instance, use a document to store meta-information about your index that had whatever meaning you wanted in a field(s) that *no* other document had. Your app could then read that special document and make use of that info. Searches on normal documents wouldn't return that doc, etc. You could effectively have N indexes contained in one index where a document in each logical sub-index had fields disjoint from the other logical sub-indexes. Why you'd do something like that rather than use cores is a very good question, but you #could# do it that way... All this is much different from a database where there are penalties for defining a large number of unused fields. Whether doing this is wise or not given the particular problem you're trying to solve is another discussion G.. Best Erick On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote: Based on more searches and manual consolidation, I've put together some of the ideas for this already suggested in a summary below. The last item in the summary seems to be interesting, low technical cost way of doing it. Basically, it treats the index like a 'BigTable', a la No SQL. Erick Erickson pointed out: ...but there's absolutely no requirement that all documents in SOLR have the same fields... I guess I don't have the right understanding of what goes into a Document in Solr. Is it just a set of fields, each with it's own independent field type declaration/id, it's name, and it's content? So even though there's a schema for an index, one could ignore it and jsut throw any other named fields and types and content at document addition time? So If I wanted to search on a base set, all documents having it, I could then additionally filter based on the (might be wrong use of this) dynamic fields? Origninal Thread that I started: http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html - Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!): - 1/ Base object of some kind, x number of fields 2/ Derived objects representing Divisiion in company, different customer bases, etc. each having 2 additional, unique fields. 3/ Assume 1000 such derived object types 4/ A 'flattened' Index would have the x base object fields, and 2000 additional fields Solutions Posited --- A/ First thought, muliti-value columns as key pairs. 1/ Difficult to access individual items of more than one 'word' length for querying in multivalued fields. 2/ All sorts of statistical stuff probably wouldn't apply? 3/ (James Dayer said:) There's also one gotcha we've experienced when searching acrosse multi-valued fields: SOLR will match across field occurences. In the example below, if you were to search q=contrib_name:(james AND smith), you will get this record back. It matches one name from one contributor and another name from a different contributor. This is not what our users want. As a work-around, I am converting these to phrase queries with slop: james smith~50 ... Just use a slop # smaller than your positionIncrementGap and bigger than the # of terms entered. This will prevent the cross-field matches yet allow the words to occur in any order. The problem with this approach is that Lucene doesn't support wildcards in phrases B/ Dynamic fields was suggested, but I am not sure exactly how they work, and the person who suggested it was not sure it would work, either. C/ Different field naming conventions were suggested in field types were similar. I can't predict that. D/ Found this old thread, and i had other suggestions: 1/ Use multiple cores, one for each record type/schema, aggregate them in during the query. 2/ Use a fixed number of additional fields X 2. Eatch additional field is actually a pair of fields. The first
Re: Consequences for using multivalued on all fields
PositionIncrementGap for multiValued fields is, perhaps, the most interesting difference. One of the drivers here is, say, indexing across some boundary that you don't want phrases or near clauses to match. For instance, say you have text with sentences, and your requirement is that phrases don't match across sentence boundaries. One way to handle that is to add successive sentences to a multivalued field and define that field with a large increment gap. But otherwise, as far as I know, there's no difference worth mentioning between indexing a bunch of stuff as one long string or breaking it up into multiple segments in a multivalued field with the increment gap set to 1, except for edge cases like the sorting thing Geert-Jan mentions Best Erick On Tue, Dec 21, 2010 at 12:49 PM, Dennis Gearon gear...@sbcglobal.netwrote: Thanks you for the input. You might have seen my posts about doing a flexible schema for derived objects. Sounds like dynamic fields might be the ticket. We'll be ready to test the idea in about a month, mabye 3 weeks. I'll post a comment about it whn it gets there. I don't know if I would gain anything, but I think that ALL boolean that were NOT in the base object but wehre in the derived objects could be put into one field and textually positioned key:pairs, at least for searh purposes. Since the derived object would have it's own, additional methods, one of those methods could be to 'unserialize' the 'boolean column'. In fact, that could be a base object function - Empty boolean column values just end up not populating any extra base object attiributes. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: kenf_nc ken.fos...@realestate.com To: solr-user@lucene.apache.org Sent: Tue, December 21, 2010 6:07:51 AM Subject: Re: Consequences for using multivalued on all fields I have about 30 million documents and with the exception of the Unique ID, Type and a couple of date fields, every document is made of dynamic fields. Now, I only have maybe 1 in 5 being multi-value, but search and facet performance doesn't look appreciably different from a fixed schema solution. I don't do some of the fancier things, highlighting, spell check, etc. And I use a lot more string or lowercase field types than I do Text (so not as many fully tokenized fields), that probably helps with performance. The only disadvantage I know of is dealing with field names at runtime. Depending on your architecture, you don't really know what your document looks like until you have it in a result set. For what I'm doing, that isn't a problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Recap on derived objects in Solr Index, 'schema in a can'
I'm open to cores, if it's the faster(indexing/querying/keeping mentally straight) way to do things. But from what you say below, the eventual goal of the site would mean either 100 extra 'generic' fields, or 1,000-100,000's of cores. Probably cores is easier to administer for security and does more accurate querying? What is the relationship between dynamic fields and the schema? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, December 22, 2010 10:44:27 AM Subject: Re: Recap on derived objects in Solr Index, 'schema in a can' No, one cannot ignore the schema. If you try to add a field not in the schema you get an error. One could, however, use any arbitrary subset of the fields defined in the schema for any particular #document# in the index. Say your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one doc, and fields f6-f10 in another and f1, f4, f9 in another and. The only field(s) that #must# be in a document are the required=true fields. There's no real penalty for omitting fields from particular documents. This allows you to store special documents that aren't part of normal searches. You could, for instance, use a document to store meta-information about your index that had whatever meaning you wanted in a field(s) that *no* other document had. Your app could then read that special document and make use of that info. Searches on normal documents wouldn't return that doc, etc. You could effectively have N indexes contained in one index where a document in each logical sub-index had fields disjoint from the other logical sub-indexes. Why you'd do something like that rather than use cores is a very good question, but you #could# do it that way... All this is much different from a database where there are penalties for defining a large number of unused fields. Whether doing this is wise or not given the particular problem you're trying to solve is another discussion G.. Best Erick On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote: Based on more searches and manual consolidation, I've put together some of the ideas for this already suggested in a summary below. The last item in the summary seems to be interesting, low technical cost way of doing it. Basically, it treats the index like a 'BigTable', a la No SQL. Erick Erickson pointed out: ...but there's absolutely no requirement that all documents in SOLR have the same fields... I guess I don't have the right understanding of what goes into a Document in Solr. Is it just a set of fields, each with it's own independent field type declaration/id, it's name, and it's content? So even though there's a schema for an index, one could ignore it and jsut throw any other named fields and types and content at document addition time? So If I wanted to search on a base set, all documents having it, I could then additionally filter based on the (might be wrong use of this) dynamic fields? Origninal Thread that I started: http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html l - - Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!): - - 1/ Base object of some kind, x number of fields 2/ Derived objects representing Divisiion in company, different customer bases, etc. each having 2 additional, unique fields. 3/ Assume 1000 such derived object types 4/ A 'flattened' Index would have the x base object fields, and 2000 additional fields Solutions Posited --- A/ First thought, muliti-value columns as key pairs. 1/ Difficult to access individual items of more than one 'word' length for querying in multivalued fields. 2/ All sorts of statistical stuff probably wouldn't apply? 3/ (James Dayer said:) There's also one gotcha we've experienced when searching acrosse multi-valued fields: SOLR will match across field occurences. In the example below, if you were to search q=contrib_name:(james AND smith), you will get this record back. It matches one name from one contributor and another name from a different contributor. This is not what our users want. As a work-around,
Re: full text search in multiple fields
Certainly did! Why, are you saying this code is correct as-is? -- View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133022.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Case Insensitive sorting while preserving case during faceted search
: Hoss, I think the use case being asked about is specifically doing a : facet.sort though, for cases where you actually do want to sort facet values : with facet.sort, not sort records -- while still presenting the facet values : with original case, but sorting them case insensitively. Ah yes ... thank you, i did in fact missunderstand the question. : Because I'm pretty sure there isn't really any good solution for this, Solr : just won't do that, just how it goes. correct. the facet constraint values come from indexed terms, and the terms are what get sorted by facet.sort -- if you want to collapse some terms down so they are equivilent (ie: Foo and foo and foo are treated identical) then that's what you get back. if your goal is just to have pretty values, you can use things like the CapitalizationFilter, but if you need a particularly complex analyzer for your values in order for them to sort a certain way, you can't then get back the original pre-analyzed values. One way people deal with this typ of situation, is to index identifers for their facet constraints, and then their UI uses those ids to lookupthe display value (ie: index categoryId, display categoryName) ... this has the added benefit of allowing you to change category names w/o re-indexing. -Hoss
Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo
So problem may be that index folder was not able to create. So try to check the conf folder where solconfig.xml schema.xml resides. Also u may try to index using $java -jar post.jar *.xml files. You may try different version like 1.3.0 or 1.4.0 to test what is wrong. It sometimes happens that the downloaded solr may have something missing. On Wed, Dec 22, 2010 at 9:18 PM, Bac Hoang [via Lucene] ml-node+2131930-846132511-146...@n3.nabble.comml-node%2b2131930-846132511-146...@n3.nabble.com wrote: Hello Anurag, The specific problem I faced when started solr in Geronimo (http://{server}:{port}/solr) is /solr/data/index could not be found, then solr tried to create that folder but failed, even permission is granted. More detail got from the log: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.IOException: Cannot create directory: /solr/data/index You're right, I'm using solr is 1.4.1 Thanks indeed Bac Hoang -Ursprüngliche Nachricht- Von: Anurag [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=2131930i=0] Gesendet: tư 12/22/2010 10:17 CH An: [hidden email] http://user/SendEmail.jtp?type=nodenode=2131930i=1 Betreff: Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo What you want to ask? When this problem arises.? Is it when you try to index to solr? What are the commands that you are running? Which version of solr( 1.4.1?). On Wed, Dec 22, 2010 at 5:49 PM, Bac Hoang [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=2131930i=2[hidden email] http://user/SendEmail.jtp?type=nodenode=2131930i=3 wrote: Hello Erick, Could you kindly give a hand on my problem. Any ideas, hints, suggestions are highly appreciated. Many thanks 1. The problem: Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2. Some other info.: - use the solr example 1.4.1 - Geronimo 2.1.6 - solr home: /opt/dev/config/solr - dataDir: /opt/dev/config/solr/data/index. I set the read, write right to every and each folder, from opt, dev...to the last one, index (just for sure ;) ) - lockType: - single/ simple: Cannot create directory: /solr/data/index at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397) - native: Cannot create directory: /solr/data/index at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock - the Geronimo log: === 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:03,001 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:07,941 INFO [DirectoryMonitor] Hot deployer notified that an artifact was removed: default/solr2/1293005281314/war 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:09,148 INFO [SupportedModesServiceImpl] Portlet mode 'help' not found for portletId: '/console-base.WARModules!874780194|0' 2010-12-22 15:13:14,139 INFO [SupportedModesServiceImpl] Portlet mode 'edit' not found for portletId: '/plugin.Deployment!227983155|0' 2010-12-22 15:13:18,795 WARN [TomcatModuleBuilder] Web application . does not contain a WEB-INF/geronimo-web.xml deployment plan. This may or may not be a problem, depending on whether you have things like resource references that need to be resolved. You can also give the deployer a separate deployment plan file on the command line. 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Using JNDI solr.home: /opt/dev/config/solr 2010-12-22 15:13:19,040 INFO [SolrResourceLoader] Solr home set to '/opt/dev/config/solr/' 2010-12-22 15:13:19,051 INFO [SolrDispatchFilter] SolrDispatchFilter.init() 2010-12-22 15:13:19,462 INFO [IndexSchema] default search field is text 2010-12-22 15:13:19,463 INFO [IndexSchema] query parser default operator is OR 2010-12-22 15:13:19,464 INFO [IndexSchema] unique key field: id 2010-12-22 15:13:19,490 INFO [JmxMonitoredMap] JMX monitoring is enabled. Adding Solr mbeans to JMX Server: com.sun.jmx.mbeanserver.jmxmbeanser...@144752d 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[]} 2010-12-22 15:13:19,525 INFO [SolrCore] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[{q=solr rocks,start=0,rows=10}, {q=static firstSearcher warming query from solrconfig.xml}]} 2010-12-22 15:13:19,533 WARN [SolrCore] Solr index directory '/solr/data/index' doesn't exist. Creating new index... 2010-12-22 15:13:19,599
Re: Configuration option for disableReplication
I've just done a bit of playing here, because I've spent a lot of time reading the SolrReplication wiki page[1], and have often wondered how some features interact. Unfortunately, if you specify str name=enablefalse/str in your replication request handler for your master, you cannot re-enable it with a call to /solr/replication?command=enablereplication Therefore, it would seem your best bet is to call /solr/replication?command=disablepolling on all of your slaves prior to upgrading. Then, when you're sure everything is right, call /solr/replication?command=enablepolling on each slave, and you should be good to go. I tried this, watching the request log on my master, and the incoming replication requests did actually stop due to the disablepolling command, so you should be fine with this approach. Does this get you to where you want to be? Upayavira On Wed, 22 Dec 2010 17:10 +, Francis Rhys-Jones francis.rhys-jo...@guardian.co.uk wrote: Hi, I am looking into using a multi core configuration to allow us to fully rebuild our index while still applying updates. I have two cores main-core and rebuild-core. I push the whole dataset into the rebuild core, during which time I can happily keep pushing updates into the main-core. Once the rebuild is complete I swap the cores and delete *:* from the rebuild core. This works fine however there are a couple of edge cases: On server restart solr needs to remember which core has been swapped in to be the main core, this can be solved by adding the persistent=true attribute to the solr config, however this does require the solr.xml to be writeable. While deploying a new version of our application we overwrite the solr.xml, as the new version could potentially have legitimate changes to the solr.xml that need to be rolled out, again leaving the cores out of sync. My proposed solution is to have the indexing process do some sanity checking at the start of each run, and swap in the correct core if necessary. This works however there is the potential for the slaves to start replicating the empty index before the correct index is swapped in. To get round this problem I would like to have replication disabled on start up. Removing replicateAfter=startup has this affect but it would be more future proof to be able to specify a default for the replicationEnabled field (see SOLR-1175) in the ReplcationHandler, stopping replication until I explicitly turn it on. The change looks fairly simple. --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Recap on derived objects in Solr Index, 'schema in a can'
A dynamic field just means that the schema allows any field with a name matching the wildcard. That's all. There is no support for referring to all of the existing fields in the wildcard. That is, there is no support for *_en:word as a field search. Nor is there any kind of grouping for facets. The feature for addressing a particular field in some of the parameters does not support wildcards. If you add wildcard fields, you have to remember what they are. On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon gear...@sbcglobal.net wrote: I'm open to cores, if it's the faster(indexing/querying/keeping mentally straight) way to do things. But from what you say below, the eventual goal of the site would mean either 100 extra 'generic' fields, or 1,000-100,000's of cores. Probably cores is easier to administer for security and does more accurate querying? What is the relationship between dynamic fields and the schema? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, December 22, 2010 10:44:27 AM Subject: Re: Recap on derived objects in Solr Index, 'schema in a can' No, one cannot ignore the schema. If you try to add a field not in the schema you get an error. One could, however, use any arbitrary subset of the fields defined in the schema for any particular #document# in the index. Say your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one doc, and fields f6-f10 in another and f1, f4, f9 in another and. The only field(s) that #must# be in a document are the required=true fields. There's no real penalty for omitting fields from particular documents. This allows you to store special documents that aren't part of normal searches. You could, for instance, use a document to store meta-information about your index that had whatever meaning you wanted in a field(s) that *no* other document had. Your app could then read that special document and make use of that info. Searches on normal documents wouldn't return that doc, etc. You could effectively have N indexes contained in one index where a document in each logical sub-index had fields disjoint from the other logical sub-indexes. Why you'd do something like that rather than use cores is a very good question, but you #could# do it that way... All this is much different from a database where there are penalties for defining a large number of unused fields. Whether doing this is wise or not given the particular problem you're trying to solve is another discussion G.. Best Erick On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote: Based on more searches and manual consolidation, I've put together some of the ideas for this already suggested in a summary below. The last item in the summary seems to be interesting, low technical cost way of doing it. Basically, it treats the index like a 'BigTable', a la No SQL. Erick Erickson pointed out: ...but there's absolutely no requirement that all documents in SOLR have the same fields... I guess I don't have the right understanding of what goes into a Document in Solr. Is it just a set of fields, each with it's own independent field type declaration/id, it's name, and it's content? So even though there's a schema for an index, one could ignore it and jsut throw any other named fields and types and content at document addition time? So If I wanted to search on a base set, all documents having it, I could then additionally filter based on the (might be wrong use of this) dynamic fields? Origninal Thread that I started: http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html l - - Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!): - - 1/ Base object of some kind, x number of fields 2/ Derived objects representing Divisiion in company, different customer bases, etc. each having 2 additional, unique fields. 3/ Assume 1000 such derived object types 4/ A 'flattened' Index would have the x base object fields, and 2000 additional fields Solutions Posited --- A/ First thought, muliti-value columns as key pairs. 1/ Difficult to access individual items of more than one 'word' length
Re: edismax inconsistency -- AND/OR
On 12/22/2010 8:25 AM, Dyer, James wrote: I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser). I'm experiencing inconsistent behavior with terms grouped in parenthesis. Sometimes they are AND'ed and sometimes OR'ed together. 1. q=Title:(life)defType=edismax 285 results 2. q=Title:(hope)defType=edismax 34 results 3. q=Title:(life AND hope)defType=edismax 1 result 4. q=Title:(life OR hope)defType=edismax 318 results 5. q=Title:(life hope)defType=edismax 1 result (life, hope are being AND'ed together) 6. q=Title:(life AND hope) AND Title:(life)defType=edismax 1 result 7. q=Title:(life OR hope) AND Title:(life)defType=edismax 285 result 8. q=Title:(life hope) AND Title:(life)defType=edismax 285 results (life, hope are being OR'ed together) See how in #5, the two terms get AND'ed, but by adding the additional (nonsense) clause in #8, the first two terms get OR'ed . Is this a feature or a bug? Am I likely doing something wrong? The dismax parser doesn't pay any attention to the default query operator. in the absence of these values in the actual query, edismax likely doesn't either. What matters is the value of the mm variable, also known as minimum 'should' match. If your mm value is 50%, which is a common value to see in dismax examples, I believe it would behave exactly like you are seeing. This is a complex little beast. Just a couple of weeks ago, Chris Hostetter said that although he wrote the code and the syntax for mm, the explanation for the parameter that's in the Smiley and Pugh Solr book (pages 138-140) is the clearest he's ever seen. Here's some detailed documentation on it. I can't find my copy of the book right now, so I don't know if this is as good as what's in it: http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html Hopefully this is applicable to you, and not something you already thought of! Shawn
Re: hole RAM using by solr during Optimize
On 12/22/2010 2:56 AM, stockii wrote: Hello. I have a RAM problem during a optimize. When is start an delta or full import, solr using only this ram which i allocate to him. eg.: java -jar -Xmx2g start.jar when solr is fetching the rows from database the using of ram ist okay. But when solr begin to otimize, solr want all of the available ram ?!?!?!?!?!? why is it so. the used Ram jumpes into the sky and only 40 MB Ram is free, of 8 GB !!! how can i limit this ? Is it Solr that's using all the RAM, or the OS disk cache? I have found other messages from you that say you're on Linux, so going with that assumption, you can see everything if you run the 'top' command and press shift-M to sort it by memory usage. Solr (java) should be at the top of the list, and the RES (or maybe RSS, depending on flavor) column will tell you how much RAM it's using. Having only 40MB free memory is typical for a Linux system. Above the process list are a bunch of indicators that give you the overall RAM usage. The number on the bottom right is cached. This refers to the OS disk cache, and it probably has the bulk of your usage. Below is what my screen looks like. Solr is using 1.4GB of RAM (out of 2.5GB possible), the disk cache is using 7.5GB, and I have less than 30MB free. top - 15:20:04 up 34 days, 16 min, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 68 total, 2 running, 66 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 9437184k total, 9407424k used,29760k free, 165464k buffers Swap: 1048568k total, 68k used, 1048500k free, 7527788k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 22928 ncindex 20 0 2574m 1.4g 9m S 0.0 15.1 432:36.91 java 21319 root 15 0 90136 3424 2668 R 0.0 0.0 0:00.01 sshd If it's your disk cache that's using up most of your memory, it's perfectly normal. Solr is not to blame for it, and you do not want to change it. If you're worried about memory usage because you have performance issues, I can try to narrow it down for you. That will require more information, starting with your 'top' output, total index size, and if you're using distributed search, how big each shard is. I am likely to ask for further information beyond that. Shawn
Re: full text search in multiple fields
Certainly did! Why, are you saying this code is correct as-is? Yes, the query q=title_search:hort*defType=lucene should return documents having Hortus supremus in their title field with the configurations you send us. It should exists somewhere in the result set, if not in the top 10. Try a few things to make sure your document is indexed. q=title_search:Hortus supremusdefType=lucenefl=title,title_search q=title:Hortus supremusdefType=lucenefl=title,title_search Are they returning that document? Or find that document's unique id and query it.
Re: DIH for taxonomy faceting in Lucid webcast
: : 1) My categories are stored in database as coded numbers instead of : fully spelled out names. For example I would have a category of 2/7 : and a lookup dictionary to convert 2/7 into NonFic/Science. How do I : do such lookup in DIH? My advice: don't. I thought i mentioned this in that webcast, but if you've already got unique identifiers for your category names, keep using them in your index/facets, and then have your front end application resolve them into pretty category names. it's usually just as easy to do apply the labels at query time as at index time, and if you do it at query time you can tweak the labels w/o reindexing. : 2) Once I have the fully spelled out category path such as : NonFic/Science, how do I turn that into 0/NonFic : 1/NonFic/Science using the DIH? I don't have any specific suggestions for you -- i've never tried it in DIH myself. the ScriptTransformer might be able to help you out, but i'm not sure. : 3) Some of my categories are multi-words containing whitespaces, such as : Computer Science and Functional Programming, so I'd have facet : values such as 2/NonFic/Computer Science/Functional Programming. How : do I handle whitespaces in this case? Would filtering by fq still work? a) it should if you use the {!raw} qparser b) if you follow my advice in #1, it won't matter. -Hoss
Re: full text search in multiple fields
Ok, I was trying to hide the actual name of the location, because I dont want it to get indexed by search engines AND its a bit of a weird name :p The name of the location in the database is: Museumrestaurant De Pappegay Anyway, here it is, I executed the queries you gave me, and this is the result: DOC FOUND: http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title_search:%22pappegay%22defType=lucenefl=title,title_search http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title_search:%22Pappegay%22defType=lucenefl=title,title_search http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title:%22Pappegay%22defType=lucenefl=title,title_search NO DOC FOUND: http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title:%22pappegay%22defType=lucenefl=title,title_search -- View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Different Results..
--- On Wed, 12/22/10, satya swaroop satya.yada...@gmail.com wrote: From: satya swaroop satya.yada...@gmail.com Subject: Different Results.. To: solr-user@lucene.apache.org Date: Wednesday, December 22, 2010, 10:44 AM Hi All, i am getting different results when i used with some escape keys.. for example::: 1) when i use this request http://localhost:8080/solr/select?q=erlang!ericson the result obtained is result name=response numFound=1934 start=0 2) when the request is http://localhost:8080/solr/select?q=erlang/ericson the result is result name=response numFound=1 start=0 My query here is, do solr consider both the queries differently and what do it consider for !,/ and all other escape characters. First of all ! has a special meaning. it means NOT. It is part of the query syntax. It is equivalent to minus - operator. q=erlang!ericson is parsed into : defaultSearchField:erlang -defaultSearchField:ericson You can see this by appending debugQuery=on to your search URL. So you need to escape ! in your case. q=erlang\!ericson will return same result set as q=erlang/ericson You can see the complete list of special charter list. http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping Special Characters
Re: full text search in multiple fields
The name of the location in the database is: Museumrestaurant De Pappegay What was the wildcard query for this?
Sorting results on MULTIPLE fields, not showing expected order
I want to sort results as follows - highest membervalue (float) on top. - within those results I want to sort the items that share the same position on the user rating (integer), once again, highest rating on top - and within those results I want to sort the items that share the same position on the fact if they have a photo (bit) Now I have this: fq=themes:%22Boat%20and%20Water%22sort=hasphoto%20descq=*:*fl=id,title I see the correct item on top. But when I have the full query: fq=themes:%22Boat%20and%20Water%22sort=membervalue %20descsort=location_rating%20descsort=hasphoto%20descq=*:*fl=id,title An item appears on top that has: membervalue=0.00 location_rating=0 hasphoto=false There are other location that have either a higher membervalue, a locationrating or a photo. This location should NOT be on top. Why is this happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-results-on-MULTIPLE-fields-not-showing-expected-order-tp2133959p2133959.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: edismax inconsistency -- AND/OR
Shawn, Thank you for the reply. The URL you gave was helpful and Smiley Pugh even more so. On Smiley Pugh page 140, they indicate that mm=100% using dismax is analogous to Standard's q.op=AND. This is exactly what I need. However...testing with these queries and edismax, I get different # of results: q=Title:(life hope) AND Title:(life)q.op=AND (STANDARD Q.P.) - 1 result q=Title:(life AND hope) AND Title:(life)defType=edismax - 1 result q=Title:(life hope) AND Title:(life)defType=edismaxmm=100% - 285 results (ut-oh. looks like the first 2 get OR'ed) The dismax parser seems to behave as documented: q=life hope lifedefType=dismaxrows=0qf=Titlemm=0% - 285 results (results are OR'ed as expected) q=life hope lifedefType=dismaxrows=0qf=Titlemm=100% - 1 result (results are AND'ed as expected) Unfortunately I need to be able to combine the use of pf with key:value syntax, wildcards, etc, so I need to use edismax, I think. With a quick glance at ExtendedDismaxQParserPlugin, I'm finding... - MM is ignored if there are any of these operators in the query (OR NOT + -) ... but AND is ok (line 227) - MM is ignored if the parse method did not return a BooleanQuery instance (line 244) - MM is used after all regardless of operators used in the query, so long as its a BooleanQuery (line 286) - The default MM value is 100% if not specified in the query parameters (lines 241, 283) Given the apparent contradiction here, my very quick analysis is surely missing something! But if this is accurate, then the trick is to formulate the query in such a way so that parse returns an instance of BooleanQuery, right? Any more advice anyone can give is appreciated! For the client I'm responsible for, I'm just inserting explicit operators between all of the user's queries. But for the client I'm not responsible for I would love to have a workaround for the other developers! I think they'd appreciate it... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, December 22, 2010 4:08 PM To: solr-user@lucene.apache.org Subject: Re: edismax inconsistency -- AND/OR On 12/22/2010 8:25 AM, Dyer, James wrote: I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser). I'm experiencing inconsistent behavior with terms grouped in parenthesis. Sometimes they are AND'ed and sometimes OR'ed together. 1. q=Title:(life)defType=edismax 285 results 2. q=Title:(hope)defType=edismax 34 results 3. q=Title:(life AND hope)defType=edismax 1 result 4. q=Title:(life OR hope)defType=edismax 318 results 5. q=Title:(life hope)defType=edismax 1 result (life, hope are being AND'ed together) 6. q=Title:(life AND hope) AND Title:(life)defType=edismax 1 result 7. q=Title:(life OR hope) AND Title:(life)defType=edismax 285 result 8. q=Title:(life hope) AND Title:(life)defType=edismax 285 results (life, hope are being OR'ed together) See how in #5, the two terms get AND'ed, but by adding the additional (nonsense) clause in #8, the first two terms get OR'ed . Is this a feature or a bug? Am I likely doing something wrong? The dismax parser doesn't pay any attention to the default query operator. in the absence of these values in the actual query, edismax likely doesn't either. What matters is the value of the mm variable, also known as minimum 'should' match. If your mm value is 50%, which is a common value to see in dismax examples, I believe it would behave exactly like you are seeing. This is a complex little beast. Just a couple of weeks ago, Chris Hostetter said that although he wrote the code and the syntax for mm, the explanation for the parameter that's in the Smiley and Pugh Solr book (pages 138-140) is the clearest he's ever seen. Here's some detailed documentation on it. I can't find my copy of the book right now, so I don't know if this is as good as what's in it: http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html Hopefully this is applicable to you, and not something you already thought of! Shawn
Any way to tie corresponding values together in different multiValued fields?
I have products, each has a specific Product ID. For certain products such as Shirts, there are also extra fields such as Size and Color. Right now I define both Size and Color as multiValued fields. And when I have a Shirt of Size M and Color white, I just put M in Size and white in Color. Now if I have another shirt with the same Product ID but Size L and Color blue, I add L to Size and blue to Color. This causes a problem during faceting. If a user filters on M for Size and blue for Color, he'd get a match. But in reality there isn't a shirt with Size M and Color blue. Is there any way to encode the data to tie Size M to Color white, and to tie Size L to Color blue so that the filtering would come out right? How should I handle this use case? Thanks.
Re: Sorting results on MULTIPLE fields, not showing expected order
--- On Thu, 12/23/10, PeterKerk vettepa...@hotmail.com wrote: From: PeterKerk vettepa...@hotmail.com Subject: Sorting results on MULTIPLE fields, not showing expected order To: solr-user@lucene.apache.org Date: Thursday, December 23, 2010, 1:01 AM I want to sort results as follows - highest membervalue (float) on top. - within those results I want to sort the items that share the same position on the user rating (integer), once again, highest rating on top - and within those results I want to sort the items that share the same position on the fact if they have a photo (bit) Now I have this: fq=themes:%22Boat%20and%20Water%22sort=hasphoto%20descq=*:*fl=id,title I see the correct item on top. But when I have the full query: fq=themes:%22Boat%20and%20Water%22sort=membervalue %20descsort=location_rating%20descsort=hasphoto%20descq=*:*fl=id,title An item appears on top that has: membervalue=0.00 location_rating=0 hasphoto=false There are other location that have either a higher membervalue, a locationrating or a photo. This location should NOT be on top. Why is this happening? Multiple sort orderings can be separated by a comma, ie: sort=field name+direction[,field name+direction]... [1] [1]http://wiki.apache.org/solr/CommonQueryParameters#sort
Re: full text search in multiple fields
Mmmm, this is strange: When I do: q=title_search:Pappegay*defType=luceneq=*:*fl=id,title nothing is found. but if I do: q=title_search:PappegaydefType=luceneq=*:*fl=id,title the location IS found. I do need a wildcard though, since users may also search on parts of the title (as described earlier in this post). But this looks almost as if the location is not found if the wildcard is on the end and the searched string is no longer than the position of the wildcard(if that makes sense :) -- View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133991.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting results on MULTIPLE fields, not showing expected order
Wow, you're fast :) But that indeed did the trick, thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-results-on-MULTIPLE-fields-not-showing-expected-order-tp2133959p2134000.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH for taxonomy faceting in Lucid webcast
--- On Wed, 12/22/10, Chris Hostetter hossman_luc...@fucit.org wrote: : 2) Once I have the fully spelled out category path such as : NonFic/Science, how do I turn that into 0/NonFic : 1/NonFic/Science using the DIH? I don't have any specific suggestions for you -- i've never tried it in DIH myself. the ScriptTransformer might be able to help you out, but i'm not sure. Thanks Chris. What did you use to generate those encodings if not DIH?
Re: full text search in multiple fields
When I do: q=title_search:Pappegay*defType=luceneq=*:*fl=id,title nothing is found. but if I do: q=title_search:PappegaydefType=luceneq=*:*fl=id,title the location IS found. I do need a wildcard though, since users may also search on parts of the title (as described earlier in this post). But this looks almost as if the location is not found if the wildcard is on the end and the searched string is no longer than the position of the wildcard(if that makes sense :) Why are you using two q parameters in your search URL? q=*:*q=title_search:Pappegay*
Re: full text search in multiple fields
When I do: q=title_search:Pappegay*defType=luceneq=*:*fl=id,title nothing is found. This is expected since you have lowercase filter in your index analyzer. Wildcard searches are not analyzed. So you need to lowercase your query on client side. q=title_search:pappegay*defType=lucenefl=id,title
Re: full text search in multiple fields
Oeps, sloppy, was a copy paste error. I now have: WORKING: http://localhost:8983/solr/db/select/?indent=onq=title_search:PappegaydefType=lucenefl=id,title NOT WORKING: http://localhost:8983/solr/db/select/?indent=onq=title_search:Pappegay*defType=lucenefl=id,title -- View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2134044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Recap on derived objects in Solr Index, 'schema in a can'
I think I'm just going to have to have my partner and I play with both cores and dynamic fields. If multiple cores are queried, and the schemas match up in order and postion for the base fields, the 'extra' fields in the different cores just show up in the result set with their field names? The query against different cores, with 'base attributes' and 'extended attributes' has to be tailored for each core, right? I.E., not querying for fields that don't exist? (That could be handled by making the query a server side langauge object with inheritance for the extended fields) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, December 22, 2010 1:45:04 PM Subject: Re: Recap on derived objects in Solr Index, 'schema in a can' A dynamic field just means that the schema allows any field with a name matching the wildcard. That's all. There is no support for referring to all of the existing fields in the wildcard. That is, there is no support for *_en:word as a field search. Nor is there any kind of grouping for facets. The feature for addressing a particular field in some of the parameters does not support wildcards. If you add wildcard fields, you have to remember what they are. On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon gear...@sbcglobal.net wrote: I'm open to cores, if it's the faster(indexing/querying/keeping mentally straight) way to do things. But from what you say below, the eventual goal of the site would mean either 100 extra 'generic' fields, or 1,000-100,000's of cores. Probably cores is easier to administer for security and does more accurate querying? What is the relationship between dynamic fields and the schema? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, December 22, 2010 10:44:27 AM Subject: Re: Recap on derived objects in Solr Index, 'schema in a can' No, one cannot ignore the schema. If you try to add a field not in the schema you get an error. One could, however, use any arbitrary subset of the fields defined in the schema for any particular #document# in the index. Say your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one doc, and fields f6-f10 in another and f1, f4, f9 in another and. The only field(s) that #must# be in a document are the required=true fields. There's no real penalty for omitting fields from particular documents. This allows you to store special documents that aren't part of normal searches. You could, for instance, use a document to store meta-information about your index that had whatever meaning you wanted in a field(s) that *no* other document had. Your app could then read that special document and make use of that info. Searches on normal documents wouldn't return that doc, etc. You could effectively have N indexes contained in one index where a document in each logical sub-index had fields disjoint from the other logical sub-indexes. Why you'd do something like that rather than use cores is a very good question, but you #could# do it that way... All this is much different from a database where there are penalties for defining a large number of unused fields. Whether doing this is wise or not given the particular problem you're trying to solve is another discussion G.. Best Erick On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote: Based on more searches and manual consolidation, I've put together some of the ideas for this already suggested in a summary below. The last item in the summary seems to be interesting, low technical cost way of doing it. Basically, it treats the index like a 'BigTable', a la No SQL. Erick Erickson pointed out: ...but there's absolutely no requirement that all documents in SOLR have the same fields... I guess I don't have the right understanding of what goes into a Document in Solr. Is it just a set of fields, each with it's own independent field type declaration/id, it's name, and it's content? So even though there's a schema for an index, one could ignore it and jsut throw any other named fields and types and content at document addition time? So If I wanted to search on a base set, all
Re: Query performance issue while using EdgeNGram
Hmmm. find evicted docs? If you mean find out how many docs are deleted, look on the admin schema browser page and the difference between MaxDoc and NumDocs is the number of deleted documents. You say for some queries the QTime is more than 8 secs. What happens if you re-run that query a bit later? The reason I ask is if you're not warming the cache that that particular query uses, you may be seeing cache loading time here. Look at the admin stats page, especially for evictions. It's also possible that your caches are being reclaimed for some queries and you're seeing response time spikes when the caches are re-loaded. Best Erick On Wed, Dec 22, 2010 at 7:10 AM, Shanmugavel SRD srdshanmuga...@gmail.comwrote: 1) Thanks for this update. I have to use 'WhiteSpaceTokenizer' 2) I have to suggest the whole query itself (Say name or title) 3) Could you please let me know if there is a way to find the evicted docs? 4) Yes, we are seeing improvement in the response time if we optimize. But still for some queries QTime is more than 8 secs. It is a 'Blocker' for us. Could you please suggest any to reduce the QTime to 1 secs. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-performance-issue-while-using-EdgeNGram-tp2097056p2130751.html Sent from the Solr - User mailing list archive at Nabble.com.
Print highlighting descriptions
I want to print the highlighting descriptions: {responseHeader:{status:0, QTime:2,params:{hl.fl:description,json.wrf:jsonp1293069622009,wt:json,q:target,hl:true}},response:{numFound:7945,start:0,maxScore:6.9186745,docs:[{description:target,url:target,id:269653,score:6.9186745},{description:Target The Woodlands,url:Target_The_Woodlands,id:37277,score:4.3241715},{description:Target Kent,url:Target_Kent,id:37275,score:4.3241715}]}, highlighting:{269653:{description:[emtarget/em ]},37277:{description:[emTarget/em The Woodlands]},37275:{description:[emTarget/em Kent]}}} I know the descriptions in docs is: response.response.docs[i].description But I don't know how to print out the highlighting descriptions, such as emTarget/em Kent (No need to highlight, just print out). Thanks Ruixiang
Re: Print highlighting descriptions
(10/12/23 11:56), Ruixiang Zhang wrote: I want to print the highlighting descriptions: {responseHeader:{status:0, QTime:2,params:{hl.fl:description,json.wrf:jsonp1293069622009,wt:json,q:target,hl:true}},response:{numFound:7945,start:0,maxScore:6.9186745,docs:[{description:target,url:target,id:269653,score:6.9186745},{description:Target The Woodlands,url:Target_The_Woodlands,id:37277,score:4.3241715},{description:Target Kent,url:Target_Kent,id:37275,score:4.3241715}]}, highlighting:{269653:{description:[emtarget/em ]},37277:{description:[emTarget/em The Woodlands]},37275:{description:[emTarget/em Kent]}}} I know the descriptions in docs is: response.response.docs[i].description But I don't know how to print out the highlighting descriptions, such as emTarget/em Kent (No need to highlight, just print out). Ruixiang, If you meant that you want to get Target Kent instead of emTarget/em Kent, you can change em tags to empty string by using hl.simple.pre/hl.simple.post parameters: http://wiki.apache.org/solr/HighlightingParameters#hl.simple.pre.2BAC8-hl.simple.post Koji -- http://www.rondhuit.com/en/
Re: Print highlighting descriptions
Thanks Koji. Actually my question is: We can use response.response.docs[i].description to print the description in docs. What expression should we use to print the description in highlighting?
Re: Print highlighting descriptions
(10/12/23 14:10), Ruixiang Zhang wrote: Thanks Koji. Actually my question is: We can use response.response.docs[i].description to print the description in docs. What expression should we use to print the description in highlighting? Ruixiang, I cannot understand your question. Is it Solr question? :) You said No need to highlight, just print out in your previous mail, then asked above??? What do you mean by expression and print? Koji -- http://www.rondhuit.com/en/
Re: Print highlighting descriptions
Hi Koji I figured it out. I can use response.highlighting[response.response.docs[0].id].description[0] to print the description in highlighting. (Actually, it's not a solr question, sorry for that.) Thanks Ruixiang On Wed, Dec 22, 2010 at 10:05 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/12/23 14:10), Ruixiang Zhang wrote: Thanks Koji. Actually my question is: We can use response.response.docs[i].description to print the description in docs. What expression should we use to print the description in highlighting? Ruixiang, I cannot understand your question. Is it Solr question? :) You said No need to highlight, just print out in your previous mail, then asked above??? What do you mean by expression and print? Koji -- http://www.rondhuit.com/en/