Re: Restarting tomcat deletes all Solr indexes
Hi, I know that when starting Solr checks index directory existence, and creates new fresh index if it doesn't exist. Does it help? If no, the next step I'd do in your case is patching SolrCore.initIndex method - insert some logging, or run EmbeddedSolrServer with debugger etc. On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote: Hi, I'm facing a silly problem. Every time I restart tomcat all the indexes are lost. I used all the default configurations. I'm pretty sure there must be some basic changes to fix this. I'd highly appreciate if someone could direct me fixing this. Thanks, KK. -- Andrew Klochkov
How to deal with Mark invalid?
Good day, people. We use solr to search in mailboxes (dovecot). But with some bad messages solr 1.4-dev generate error: SEVERE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171 . It's issue known as SOLR-42. How i can log field stored in index (i need message uid) ? How to ignore such error and/or message ? Thanks
Custom Servlet Filter, Where to put filter-mappings
Hi folks, I just wrote a Servlet Filter to handle authentication for our service. Here's what I did: 1. Created a dir in contrib 2. Put my project in there, I took the dataimporthandler build.xml as an example and modified it to suit my needs. Worked great! 3. ant dist now builds my jar and includes it I now need to modify web.xml to add my filter-mapping, init params, etc. How can I do this cleanly? Or do I need to manually open up the archive and edit it and then re-war it? In common-build I don't see a target for dist-war, so don't see how it is possible... Thanks! Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: QueryElevationComponent : hot update of elevate.xml
Hi, On May 7, 2009, at 6:03 , Noble Paul നോബിള് नोब्ळ् wrote: going forward the java based replication is going to be the preferred means replicating index. It does not support replicating files in the dataDir , it only supports replicating index files and conf files (files in conf dir). I was unaware of the fact that it was possible to put the elevate.xml in dataDir. reloading on commit is a trivial for a search component. it can register itself to be an even listener for commit and do a reload of elevate.xml. This can be a configuration parameter. str name=refreshOnCommmittrue/str Thanks for these nice tips and recommendations. I attached a new version of this requestHandler here : https:// issues.apache.org/jira/browse/SOLR-1147. Would this requestHandler be of any general use and could be part of Solr's trunk ? Thanks in advance, -- Nicolas Pastorino - eZ Labs On Wed, May 6, 2009 at 7:08 PM, Nicolas Pastorino n...@ez.no wrote: On May 6, 2009, at 15:17 , Noble Paul നോബിള് नोब्ळ् wrote: Why would you want to write it to the data dir? why can't it be in the same place (conf) ? Well, fact is that the QueryElevationComponent loads the configuration file ( elevate.xml ) either from the data dir, either from the conf dir. Which means that existing setups using this component maybe using either location. That is the only reason why i judged necessary to keep supporting this flexibility. But this could be simplified, forcing the elevate.xml file to be in the conf dir, and having a system ( the one you proposed, or the request handler attached to the issue ) to reload the configuration from the conf dir ( which is currently not possible. While when elevate.xml is stored in the dataDir, triggering a commit would reload it ). I was just unsure about all ins and outs of the Elevation system, and then did not remove this flexibility. Thanks for your expert eye on this ! On Wed, May 6, 2009 at 6:43 PM, Nicolas Pastorino n...@ez.no wrote: Hello, On May 6, 2009, at 15:02 , Noble Paul നോബിള് नोब्ळ् wrote: The elevate.xml is loaded from conf dir when the core is reloaded . if you post the new xml you will have to reload the core. A simple solution would be to write a RequestHandler which extends QueryElevationComponent which can be a listener for commit and call an super.inform() on that event You may want to have a look at this issue : https://issues.apache.org/jira/browse/SOLR-1147 The proposed solution ( new request handler, attached to the ticket ), solves the issue in both cases : * when elevate.xml is in the DataDir. * when elevate.xml is in the conf dir. Basically this new request handler receives, as XML, the new configuration, writes it to the right place ( some logic was copied from the QueryElevationComponent.inform() code ), and then calls the inform() method on the QueryElevationComponent for the current core, as you suggested above, to reload the Elevate configuration. -- Nicolas On Fri, Apr 10, 2009 at 5:18 PM, Nicolas Pastorino n...@ez.no wrote: Hello ! Browsing the mailing-list's archives did not help me find the answer, hence the question asked directly here. Some context first : Integrating Solr with a CMS ( eZ Publish ), we chose to support Elevation. The idea is to be able to 'elevate' any object from the CMS. This can be achieved through eZ Publish's back office, with a dedicated Elevate administration GUI, the configuration is stored in the CMS temporarily, and then synchronized frequently and/or on demand onto Solr. This synchronisation is currently done as follows : 1. Generate the elevate.xml based on the stored configuration 2. Replace elevate.xml in Solr's dataDir 3. Commit. It appears that when having elevate.xml in Solr's dataDir, and solely in this case, commiting triggers a reload of elevate.xml. This does not happen when elevate.xml is stored in Solr's conf dir. This method has one main issue though : eZ Publish needs to have access to the same filesystem as the one on which Solr's dataDir is stored. This is not always the case when the CMS is clustered for instance -- show stopper :( Hence the following idea / RFC : How about extending the Query Elevation system with the possibility to push an updated elevate.xml file/XML through HTTP ? This would update the file where it is actually located, and trigger a reload of the configuration. Not being very knowledgeable about Solr's API ( yet ! ), i cannot figure out whether this would be possible, how this would be achievable ( which type of plugin for instance ) or even be valid ? Thanks a lot in advance for your thoughts, -- Nicolas -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Nicolas Pastorino Consultant - Trainer - System Developer Phone : +33 (0)4.78.37.01.34 eZ Systems ( Western Europe ) | http://ez.no --
Solr Loggin issue
Hi, I have solr implemented in multi-core scenario and also implemented solr-560-slf4j.patch for implementing the logging. But the problem I am facing is that the logs are going to the stdout.log file not the log file that I have mentioned in the log4j.properties file. Can anybody give me work round to make logs go into the logger mentioned in log4j.properties file. Thanks in advance. Regards, Sagar Khetkade _ Live Search extreme As India feels the heat of poll season, get all the info you need on the MSN News Aggregator http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx
Re: Restarting tomcat deletes all Solr indexes
Thanks for your response @aklochkov. But I again noticed that something is wrong in my solr/tomcat config[I spent a lot of time making solr run], b'coz in the solr admin page [ http://localhost:8080/solr/admin/] what I see is that the $CWD is the location where from I restarted tomcat and seems this $cwd gets picked and used for index data[Is it the default behavior? or something wrong from my side?, or may be I'm asking some stupid question ]. Once I was in /etc and from there I restarted the tomcat and when I tried to open the solr admin page I found an error saying that can not create index directory some permission issue I think [it gave a directory str like /etc/solr/index ... ]. I'm pretty sure something is wrong in configuration. One more thing assures me about this is the fact that I found many solr index directories here and there[ these are I think the locations where I was when I restarted tomcat at that time ]. Earlier I was using the java_opts to set the solr home like this export JAVA_OPTS=$JAVA_OPTS -D/usr/local/solr#in .bashrc but I commented that and instead added the jndi entry in /usr/local/tomcat/webapps/solr/WEB-INF/web.xml as this env-entry env-entry-namesolr/home/env-entry-name env-entry-value/usr/local/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry Even the entry SolrHome in solr admin page say that SolrHome is /usr/loca/solr but the index gets created in $CWD. Is it the case that I created entries for SolrHome in multiple places? which is obviously wrong. Can someone point me what is the issue. Thank you very much. --KK On Tue, May 12, 2009 at 2:39 PM, Andrey Klochkov akloch...@griddynamics.com wrote: Hi, I know that when starting Solr checks index directory existence, and creates new fresh index if it doesn't exist. Does it help? If no, the next step I'd do in your case is patching SolrCore.initIndex method - insert some logging, or run EmbeddedSolrServer with debugger etc. On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote: Hi, I'm facing a silly problem. Every time I restart tomcat all the indexes are lost. I used all the default configurations. I'm pretty sure there must be some basic changes to fix this. I'd highly appreciate if someone could direct me fixing this. Thanks, KK. -- Andrew Klochkov
Geographical search based on latitude and longitude
Hi together, I'm new to Solr and want to port a geographical range search from MySQL to Solr. Currently I'm using some mathematical functions (based on GRS80 modell) directly within MySQL to calculate the actual distance from the locations within the database to a current location (lat and long are known): $query=SELECT street, zip, city, state, country, .$radius.*ACOS(cos(RADIANS(latitude))*cos(.$theta.)*(sin(RADIANS(longitude))*sin(.$phi.)+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.$theta.)) AS Distance FROM ezgis_position WHERE .$radius.*ACOS(cos(RADIANS(latitude))*cos(.$theta.)*(sin(RADIANS(longitude))*sin(.$phi.)+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.$theta.)) = .$range. ORDER BY Distance; This works pretty fine and fast. Due to we want to include this within our Solr search result I would like to have a attribute like actual_distance within the result. Is there a way to use those functions like (radians, sin, acos,...) directly within Solr? Thanks in advance for any feedback Norman Leutner
Re: Geographical search based on latitude and longitude
See https://issues.apache.org/jira/browse/SOLR-773. In other words, we're working on it and would love some help! -Grant On May 12, 2009, at 7:12 AM, Norman Leutner wrote: Hi together, I'm new to Solr and want to port a geographical range search from MySQL to Solr. Currently I'm using some mathematical functions (based on GRS80 modell) directly within MySQL to calculate the actual distance from the locations within the database to a current location (lat and long are known): $query=SELECT street, zip, city, state, country, . $radius.*ACOS(cos(RADIANS(latitude))*cos(. $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. $theta.)) AS Distance FROM ezgis_position WHERE . $radius.*ACOS(cos(RADIANS(latitude))*cos(. $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. $theta.)) = .$range. ORDER BY Distance; This works pretty fine and fast. Due to we want to include this within our Solr search result I would like to have a attribute like actual_distance within the result. Is there a way to use those functions like (radians, sin, acos,...) directly within Solr? Thanks in advance for any feedback Norman Leutner -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Restarting tomcat deletes all Solr indexes
One more information I would like to add. The entry in solr stats page says this: readerDir : org.apache.lucene.store.FSDirectory@/home/kk/solr/data/index when I ran from /home/kk and this: readerDir : org.apache.lucene.store.FSDirectory@ /home/kk/junk/solr/data/index after running from /home/kk/junk That assures the me the problem, but what is the solution? Thanks, KK. On Tue, May 12, 2009 at 4:41 PM, KK dioxide.softw...@gmail.com wrote: Thanks for your response @aklochkov. But I again noticed that something is wrong in my solr/tomcat config[I spent a lot of time making solr run], b'coz in the solr admin page [ http://localhost:8080/solr/admin/] what I see is that the $CWD is the location where from I restarted tomcat and seems this $cwd gets picked and used for index data[Is it the default behavior? or something wrong from my side?, or may be I'm asking some stupid question ]. Once I was in /etc and from there I restarted the tomcat and when I tried to open the solr admin page I found an error saying that can not create index directory some permission issue I think [it gave a directory str like /etc/solr/index ... ]. I'm pretty sure something is wrong in configuration. One more thing assures me about this is the fact that I found many solr index directories here and there[ these are I think the locations where I was when I restarted tomcat at that time ]. Earlier I was using the java_opts to set the solr home like this export JAVA_OPTS=$JAVA_OPTS -D/usr/local/solr#in .bashrc but I commented that and instead added the jndi entry in /usr/local/tomcat/webapps/solr/WEB-INF/web.xml as this env-entry env-entry-namesolr/home/env-entry-name env-entry-value/usr/local/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry Even the entry SolrHome in solr admin page say that SolrHome is /usr/loca/solr but the index gets created in $CWD. Is it the case that I created entries for SolrHome in multiple places? which is obviously wrong. Can someone point me what is the issue. Thank you very much. --KK On Tue, May 12, 2009 at 2:39 PM, Andrey Klochkov akloch...@griddynamics.com wrote: Hi, I know that when starting Solr checks index directory existence, and creates new fresh index if it doesn't exist. Does it help? If no, the next step I'd do in your case is patching SolrCore.initIndex method - insert some logging, or run EmbeddedSolrServer with debugger etc. On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote: Hi, I'm facing a silly problem. Every time I restart tomcat all the indexes are lost. I used all the default configurations. I'm pretty sure there must be some basic changes to fix this. I'd highly appreciate if someone could direct me fixing this. Thanks, KK. -- Andrew Klochkov
AW: Geographical search based on latitude and longitude
So are you using boundary box to find results within a given range(km) like mentioned here: http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html ? Best regards Norman Leutner all2e GmbH -Ursprüngliche Nachricht- Von: Grant Ingersoll [mailto:gsing...@apache.org] Gesendet: Dienstag, 12. Mai 2009 13:18 An: solr-user@lucene.apache.org Betreff: Re: Geographical search based on latitude and longitude See https://issues.apache.org/jira/browse/SOLR-773. In other words, we're working on it and would love some help! -Grant On May 12, 2009, at 7:12 AM, Norman Leutner wrote: Hi together, I'm new to Solr and want to port a geographical range search from MySQL to Solr. Currently I'm using some mathematical functions (based on GRS80 modell) directly within MySQL to calculate the actual distance from the locations within the database to a current location (lat and long are known): $query=SELECT street, zip, city, state, country, . $radius.*ACOS(cos(RADIANS(latitude))*cos(. $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. $theta.)) AS Distance FROM ezgis_position WHERE . $radius.*ACOS(cos(RADIANS(latitude))*cos(. $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. $theta.)) = .$range. ORDER BY Distance; This works pretty fine and fast. Due to we want to include this within our Solr search result I would like to have a attribute like actual_distance within the result. Is there a way to use those functions like (radians, sin, acos,...) directly within Solr? Thanks in advance for any feedback Norman Leutner -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: fieldType without tokenizer
hi I tried but Ive an error : May 12 15:48:51 solr-test jsvc.exec[2583]: May 12, 2009 3:48:51 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordTokenizer' ^Iat org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) ^Iat org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) ^Iat org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84) ^Iat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ^Iat org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:804) ^Iat org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58) ^Iat org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:425) ^Iat org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:443) ^Iat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ^Iat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:452) ^Iat org.apache.solr.schema.In with : fieldType name=text_simple class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizer/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Shalin Shekhar Mangar wrote: On Mon, May 4, 2009 at 9:28 PM, sunnyfr johanna...@gmail.com wrote: Hi, I would like to create a field without tokenizer but I've an error, You can use KeywordTokenizer which does not do any tokenization. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/fieldType-without-tokenizer-tp23371300p23502994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldType without tokenizer
Use KeywordTokenizerFactory. Pasted from Solr's example schema.xml: tokenizer class=solr.KeywordTokenizerFactory/ Erik On May 12, 2009, at 9:49 AM, sunnyfr wrote: hi I tried but Ive an error : May 12 15:48:51 solr-test jsvc.exec[2583]: May 12, 2009 3:48:51 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordTokenizer' ^Iat org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) ^Iat org .apache .solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) ^Iat org .apache .solr .util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84) ^Iat org .apache .solr .util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ^Iat org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:804) ^Iat org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java: 58) ^Iat org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:425) ^Iat org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:443) ^Iat org .apache .solr .util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ^Iat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java: 452) ^Iat org.apache.solr.schema.In with : fieldType name=text_simple class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizer/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Shalin Shekhar Mangar wrote: On Mon, May 4, 2009 at 9:28 PM, sunnyfr johanna...@gmail.com wrote: Hi, I would like to create a field without tokenizer but I've an error, You can use KeywordTokenizer which does not do any tokenization. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/fieldType-without-tokenizer-tp23371300p23502994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldType without tokenizer
It must be KeywordTokenizer*Factory* :) Koji sunnyfr wrote: hi I tried but Ive an error : May 12 15:48:51 solr-test jsvc.exec[2583]: May 12, 2009 3:48:51 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordTokenizer' ^Iat org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) ^Iat org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) ^Iat org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84) ^Iat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ^Iat org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:804) ^Iat org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58) ^Iat org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:425) ^Iat org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:443) ^Iat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ^Iat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:452) ^Iat org.apache.solr.schema.In with : fieldType name=text_simple class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizer/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Shalin Shekhar Mangar wrote: On Mon, May 4, 2009 at 9:28 PM, sunnyfr johanna...@gmail.com wrote: Hi, I would like to create a field without tokenizer but I've an error, You can use KeywordTokenizer which does not do any tokenization. -- Regards, Shalin Shekhar Mangar.
Re: Facet counts for common terms of the searched field
Does anybody have answer to this post.I have a similar requirement. Suppose I have free text field say I index the field.If I search for textfield:copper.I have to get facet counts for the most common words found in a textfield. ie. example:search for textfield:glass should return facet counts for common words found textfield. semiconductor(10),iron(20), silicon (25) material (8) thin(25) and so on. Can this be done using tagging or MLT. Thanks, Sachin Raju444us wrote: I have a requirement. If I search for text field let's say metal:glass what i want is to get the facet counts for all the terms related to glass in my search results. window(100) since a window can be glass. plastic(10) plastic is a material just like glass Iron(10) Paper(15) Can I use MLT to get this functionality.Please let me know how can I achieve this.If possible an example query. Thanks, Raju -- View this message in context: http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet counts for common terms of the searched field
You may have to take care of this at index time. You can create a new multivalued field that has minimal processing. Then at index time, index the full contents of textfield as normal, but then also split it on whitespace and index each word in the new field you just created. Now you will be able to facet on this new field and sort the facet by frequency (the default) to get the most popular words. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 7:33 AM, sachin78 wrote: Does anybody have answer to this post.I have a similar requirement. Suppose I have free text field say I index the field.If I search for textfield:copper.I have to get facet counts for the most common words found in a textfield. ie. example:search for textfield:glass should return facet counts for common words found textfield. semiconductor(10),iron(20), silicon (25) material (8) thin(25) and so on. Can this be done using tagging or MLT. Thanks, Sachin Raju444us wrote: I have a requirement. If I search for text field let's say metal:glass what i want is to get the facet counts for all the terms related to glass in my search results. window(100) since a window can be glass. plastic(10) plastic is a material just like glass Iron(10) Paper(15) Can I use MLT to get this functionality.Please let me know how can I achieve this.If possible an example query. Thanks, Raju -- View this message in context: http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet counts for common terms of the searched field
Thanks Matt for your reply. What do you mean by frequency(the default)? Can you please provide an example schema and query will look like. --Sachin Matt Weber-2 wrote: You may have to take care of this at index time. You can create a new multivalued field that has minimal processing. Then at index time, index the full contents of textfield as normal, but then also split it on whitespace and index each word in the new field you just created. Now you will be able to facet on this new field and sort the facet by frequency (the default) to get the most popular words. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 7:33 AM, sachin78 wrote: Does anybody have answer to this post.I have a similar requirement. Suppose I have free text field say I index the field.If I search for textfield:copper.I have to get facet counts for the most common words found in a textfield. ie. example:search for textfield:glass should return facet counts for common words found textfield. semiconductor(10),iron(20), silicon (25) material (8) thin(25) and so on. Can this be done using tagging or MLT. Thanks, Sachin Raju444us wrote: I have a requirement. If I search for text field let's say metal:glass what i want is to get the facet counts for all the terms related to glass in my search results. window(100) since a window can be glass. plastic(10) plastic is a material just like glass Iron(10) Paper(15) Can I use MLT to get this functionality.Please let me know how can I achieve this.If possible an example query. Thanks, Raju -- View this message in context: http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23504241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to deal with Mark invalid?
OK. I've applied dirty hack as temporary solution: in src/java/org/apache/solr/analysis/HTMLStripReader.java of 1.4-dev - enclosed io.reset in try structure. ( * @version $Id: HTMLStripReader.java 646799 2008-04-10 13:36:23Z yonik $) private void restoreState() throws IOException { try { in.reset(); } catch (Exception e) { } pushed.setLength(0); } But how to resolve this problem more civilized ? On Tue, May 12, 2009 at 12:20 PM, Nikolai Derzhak niko...@zapatec.netwrote: Good day, people. We use solr to search in mailboxes (dovecot). But with some bad messages solr 1.4-dev generate error: SEVERE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171 . It's issue known as SOLR-42. How i can log field stored in index (i need message uid) ? How to ignore such error and/or message ? Thanks
Re: How to deal with Mark invalid?
I just committed a minor match suggested by Jim Murphy in SOLR-42 to slightly lower the safe read ahead limit to avoid reading beyond a a mark. Could you try out trunk (or wait until the next nightly build?) -Yonik http://www.lucidimagination.com On Tue, May 12, 2009 at 10:57 AM, Nikolai Derzhak niko...@zapatec.net wrote: OK. I've applied dirty hack as temporary solution: in src/java/org/apache/solr/analysis/HTMLStripReader.java of 1.4-dev - enclosed io.reset in try structure. ( * @version $Id: HTMLStripReader.java 646799 2008-04-10 13:36:23Z yonik $) private void restoreState() throws IOException { try { in.reset(); } catch (Exception e) { } pushed.setLength(0); } But how to resolve this problem more civilized ? On Tue, May 12, 2009 at 12:20 PM, Nikolai Derzhak niko...@zapatec.netwrote: Good day, people. We use solr to search in mailboxes (dovecot). But with some bad messages solr 1.4-dev generate error: SEVERE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171 . It's issue known as SOLR-42. How i can log field stored in index (i need message uid) ? How to ignore such error and/or message ? Thanks
Re: Facet counts for common terms of the searched field
I mean you can sort the facet results by frequency, which happens to be the default behavior. Here is an example field for your schema: field name=textfieldfacet type=string indexed=true stored=true multiValued=true / Here is an example query: http://localhost:8983/solr/select?q=textfield:copperfacet=truefacet.field=textfieldfacetfacet.limit=5 This will give you the top 5 words in the textfieldfacet. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 7:57 AM, sachin78 wrote: Thanks Matt for your reply. What do you mean by frequency(the default)? Can you please provide an example schema and query will look like. --Sachin Matt Weber-2 wrote: You may have to take care of this at index time. You can create a new multivalued field that has minimal processing. Then at index time, index the full contents of textfield as normal, but then also split it on whitespace and index each word in the new field you just created. Now you will be able to facet on this new field and sort the facet by frequency (the default) to get the most popular words. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 7:33 AM, sachin78 wrote: Does anybody have answer to this post.I have a similar requirement. Suppose I have free text field say I index the field.If I search for textfield:copper.I have to get facet counts for the most common words found in a textfield. ie. example:search for textfield:glass should return facet counts for common words found textfield. semiconductor(10),iron(20), silicon (25) material (8) thin(25) and so on. Can this be done using tagging or MLT. Thanks, Sachin Raju444us wrote: I have a requirement. If I search for text field let's say metal:glass what i want is to get the facet counts for all the terms related to glass in my search results. window(100) since a window can be glass. plastic(10) plastic is a material just like glass Iron(10) Paper(15) Can I use MLT to get this functionality.Please let me know how can I achieve this.If possible an example query. Thanks, Raju -- View this message in context: http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23504241.html Sent from the Solr - User mailing list archive at Nabble.com.
Newbie question
Hi, We're implemented search into our product here at our very small company, and the developer who integrated Solr has left. I'm picking up the code base and have run into a problem , which I imagine is simple to solve. I have this request: http://localhost:8983/solr/select?start=0rows=20qt=dismaxq=copyhl=truehl.snippets=4hl.fragsize=50facet=truefacet.mincount=1facet.limit=8facet.field=typefq=company-id%3A1wt=javabinversion=2.2 (I've been using this to see it rendered in the browser: http://localhost:8983/solr/select?indent=onversion=2.2q=copystart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=featureshl=truehl.fragsize=50 ) that I've been trying out. I get a good responce - however the hl.fragsize is ignored and the hl.fragsize in the solrconfig.xml is ignored. Instead I get back the whole document (10,000 chars!) in the doc txt field. And bizarely the response header is this: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=explainOther/ str name=hl.fragsize50/str str name=indenton/str str name=hl.flfeatures/str str name=wtstandard/str − arr name=hl stron/str strtrue/str /arr str name=version2.2/str str name=rows10/str str name=fl*,score/str str name=start0/str str name=qcopy/str str name=qtstandard/str /lst /lst − So it seems that the hl.fragsize was taken into account. I'm sure I'm being dumb but I don't know how to solve this. Any ideas? many thanks -- View this message in context: http://www.nabble.com/Newbie-question-tp23505802p23505802.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Loggin issue
Usually that means there is another log4j.properties or log4j.xml file in your classpath that is being found before the one you are intending to use. Check your classpath for other versions of these files. -Jay On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade sagar.khetk...@hotmail.comwrote: Hi, I have solr implemented in multi-core scenario and also implemented solr-560-slf4j.patch for implementing the logging. But the problem I am facing is that the logs are going to the stdout.log file not the log file that I have mentioned in the log4j.properties file. Can anybody give me work round to make logs go into the logger mentioned in log4j.properties file. Thanks in advance. Regards, Sagar Khetkade _ Live Search extreme As India feels the heat of poll season, get all the info you need on the MSN News Aggregator http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx
Replication master+slave
For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both the master and slave configuration lists need to be present inside the ReplicationHandler requestHandler in the solrconfig.xml. What does this mean? Does the core then poll itself for updates? I'd like to have a single set of configuration files that are shared by masters and slaves and avoid duplicating configuration details in multiple files (one for master and one for slave) to ease management and failover. Is this possible? When I attempt to setup a multi server master-slave configuration and include both master and slave replication configuration options, I into some problems. I'm running a nightly build from May 7. requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str /lst lst name=slave str name=masterUrlhttp://master_core01:8983/solr/core01/ replication/str str name=pollInterval00:00:60/str /lst /requestHandler When the replication admin page (http://master_core01:8983/solr/core01/ admin/replication/index.jsp) is visited, the severe error show below appears in the solr log. The server is otherwise idle so there is no reason all threads should be busy unless the replication code is getting itself into a loop. What's the right way to do this? May 11, 2009 8:01:22 PM org.apache.tomcat.util.threads.ThreadPool logFull SEVERE: All threads (150) are currently busy, waiting. Increase maxThreads (150) or check the servlet status May 11, 2009 8:01:41 PM org.apache.solr.handler.ReplicationHandler getReplicationDetails WARNING: Exception while invoking a 'details' method on master java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java: 218) at java.io.BufferedInputStream.read(BufferedInputStream.java: 237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org .apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java: 1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager $ HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java: 1413) at org .apache .commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java: 1973) at org .apache .commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java: 1735) at org .apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java: 1098) at org .apache .commons .httpclient .HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org .apache .commons .httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java: 171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 323) at org .apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java: 183) at org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java: 178) at org .apache .solr .handler .ReplicationHandler.getReplicationDetails(ReplicationHandler.java:555) at org .apache .solr .handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java: 147) at org .apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org .apache.jsp.admin.replication.index_jsp.executeCommand(index_jsp.java: 34) at org.apache.jsp.admin.replication.index_jsp._jspService(index_jsp.java: 208) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org .apache .jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:331) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 269) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org .apache .catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java: 679) at org .apache .catalina
Re: AW: Geographical search based on latitude and longitude
Yes, that is part of it, but there is more to it. See Yonik's comment about needs further down. On May 12, 2009, at 7:36 AM, Norman Leutner wrote: So are you using boundary box to find results within a given range(km) like mentioned here: http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html ? Best regards Norman Leutner all2e GmbH -Ursprüngliche Nachricht- Von: Grant Ingersoll [mailto:gsing...@apache.org] Gesendet: Dienstag, 12. Mai 2009 13:18 An: solr-user@lucene.apache.org Betreff: Re: Geographical search based on latitude and longitude See https://issues.apache.org/jira/browse/SOLR-773. In other words, we're working on it and would love some help! -Grant On May 12, 2009, at 7:12 AM, Norman Leutner wrote: Hi together, I'm new to Solr and want to port a geographical range search from MySQL to Solr. Currently I'm using some mathematical functions (based on GRS80 modell) directly within MySQL to calculate the actual distance from the locations within the database to a current location (lat and long are known): $query=SELECT street, zip, city, state, country, . $radius.*ACOS(cos(RADIANS(latitude))*cos(. $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. $theta.)) AS Distance FROM ezgis_position WHERE . $radius.*ACOS(cos(RADIANS(latitude))*cos(. $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. $theta.)) = .$range. ORDER BY Distance; This works pretty fine and fast. Due to we want to include this within our Solr search result I would like to have a attribute like actual_distance within the result. Is there a way to use those functions like (radians, sin, acos,...) directly within Solr? Thanks in advance for any feedback Norman Leutner -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Restarting tomcat deletes all Solr indexes
You can fix the path of the index in your solrconfig.xml On Tue, May 12, 2009 at 4:48 PM, KK dioxide.softw...@gmail.com wrote: One more information I would like to add. The entry in solr stats page says this: readerDir : org.apache.lucene.store.FSDirectory@/home/kk/solr/data/index when I ran from /home/kk and this: readerDir : org.apache.lucene.store.FSDirectory@ /home/kk/junk/solr/data/index after running from /home/kk/junk That assures the me the problem, but what is the solution? Thanks, KK. On Tue, May 12, 2009 at 4:41 PM, KK dioxide.softw...@gmail.com wrote: Thanks for your response @aklochkov. But I again noticed that something is wrong in my solr/tomcat config[I spent a lot of time making solr run], b'coz in the solr admin page [ http://localhost:8080/solr/admin/] what I see is that the $CWD is the location where from I restarted tomcat and seems this $cwd gets picked and used for index data[Is it the default behavior? or something wrong from my side?, or may be I'm asking some stupid question ]. Once I was in /etc and from there I restarted the tomcat and when I tried to open the solr admin page I found an error saying that can not create index directory some permission issue I think [it gave a directory str like /etc/solr/index ... ]. I'm pretty sure something is wrong in configuration. One more thing assures me about this is the fact that I found many solr index directories here and there[ these are I think the locations where I was when I restarted tomcat at that time ]. Earlier I was using the java_opts to set the solr home like this export JAVA_OPTS=$JAVA_OPTS -D/usr/local/solr#in .bashrc but I commented that and instead added the jndi entry in /usr/local/tomcat/webapps/solr/WEB-INF/web.xml as this env-entry env-entry-namesolr/home/env-entry-name env-entry-value/usr/local/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry Even the entry SolrHome in solr admin page say that SolrHome is /usr/loca/solr but the index gets created in $CWD. Is it the case that I created entries for SolrHome in multiple places? which is obviously wrong. Can someone point me what is the issue. Thank you very much. --KK On Tue, May 12, 2009 at 2:39 PM, Andrey Klochkov akloch...@griddynamics.com wrote: Hi, I know that when starting Solr checks index directory existence, and creates new fresh index if it doesn't exist. Does it help? If no, the next step I'd do in your case is patching SolrCore.initIndex method - insert some logging, or run EmbeddedSolrServer with debugger etc. On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote: Hi, I'm facing a silly problem. Every time I restart tomcat all the indexes are lost. I used all the default configurations. I'm pretty sure there must be some basic changes to fix this. I'd highly appreciate if someone could direct me fixing this. Thanks, KK. -- Andrew Klochkov -- Regards, Shalin Shekhar Mangar.
Re: Newbie question
On Tue, May 12, 2009 at 9:48 PM, Wayne Pope waynemailingli...@gmail.comwrote: I have this request: http://localhost:8983/solr/select?start=0rows=20qt=dismaxq=copyhl=truehl.snippets=4hl.fragsize=50facet=truefacet.mincount=1facet.limit=8facet.field=typefq=company-id%3A1wt=javabinversion=2.2 (I've been using this to see it rendered in the browser: http://localhost:8983/solr/select?indent=onversion=2.2q=copystart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=featureshl=truehl.fragsize=50 ) that I've been trying out. I get a good responce - however the hl.fragsize is ignored and the hl.fragsize in the solrconfig.xml is ignored. Instead I get back the whole document (10,000 chars!) in the doc txt field. And bizarely the response header is this: hl.fragsize is relevant only for the snippets created by the highlighter. The returned fields will always have the complete data for a document. Does that answer your question? -- Regards, Shalin Shekhar Mangar.
Re: Replication master+slave
On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot btal...@aeriagames.comwrote: For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both the master and slave configuration lists need to be present inside the ReplicationHandler requestHandler in the solrconfig.xml. What does this mean? Does the core then poll itself for updates? No. This type of configuration is meant for repeaters. Suppose there are slaves in multiple data-centers (say data center A and B). There is always a single master (say in A). One of the slaves in B is used as a master for the other slaves in B. Therefore, this one slave in B is both a master as well as the slave. I'd like to have a single set of configuration files that are shared by masters and slaves and avoid duplicating configuration details in multiple files (one for master and one for slave) to ease management and failover. Is this possible? You wouldn't want the master to be a slave. So I guess you'd need to have a separate file. Also, it needs to be a separate file so that the slave does not become a master when the solrconfig.xml is replicated. When I attempt to setup a multi server master-slave configuration and include both master and slave replication configuration options, I into some problems. I'm running a nightly build from May 7. Not sure what happened. Is that the url for this solr (meaning same solr url is master and slave of itself)? If yes, that is not a valid configuration. -- Regards, Shalin Shekhar Mangar.
error when seting queryResultWindowSize to zero
I have seen that if I set the value of queryResultWindowSize to 0 in solrconfig.xml solr will return an error of divided by zero. Checking the source I have seen it can be fixed in SolrIndexSearcher. At the end of the function getDocListC it's coded: if (maxDocRequested queryResultWindowSize) { supersetMaxDoc=queryResultWindowSize; } else { supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize + 1)*queryResultWindowSize; if (supersetMaxDoc 0) supersetMaxDoc=maxDocRequested; } I have sorted it oud doing (just addin parenthesis): if (maxDocRequested queryResultWindowSize) { supersetMaxDoc=queryResultWindowSize; } else { supersetMaxDoc = ((maxDocRequested -1)/(queryResultWindowSize + 1))*queryResultWindowSize; if (supersetMaxDoc 0) supersetMaxDoc=maxDocRequested; } I have seen this is happening in a recent trunk. Is my fix correct? -- View this message in context: http://www.nabble.com/error-when-seting-queryResultWindowSize-to-zero-tp23508478p23508478.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Selective Searches Based on User Identity
Paul -- thanks for the reply, I appreciate it. That's a very practical approach, and is worth taking a closer look at. Actually, taking your idea one step further, perhaps three fields; 1) ownerUid (uid of the document's owner) 2) grantedUid (uid of users who have been granted access), and 3) deniedUid (uid of users specifically denied access to the document). These fields, coupled with some business rules around how they were populated should cover off all possibilities I think. Access to the Solr instance would have to be tightly controlled, but that's something that should be done anyway. You sure wouldn't want end users preparing their own XML and throwing it at Solr -- it would be pretty easy to figure out how to get around the access/denied fields and get at stuff the owner didn't intend. This approach mimics to some degree what is being done in the operating system, but it's still elegant and provides the level of control required. Anybody else have any thoughts in this regard? Has anybody implemented anything similar, and if so, how did it work? Thanks, and best regards... Terence
Re: error when seting queryResultWindowSize to zero
On Tue, May 12, 2009 at 3:03 PM, Marc Sturlese marc.sturl...@gmail.com wrote: I have seen that if I set the value of queryResultWindowSize to 0 in solrconfig.xml solr will return an error of divided by zero. Seems like a configuration error since requesting that results be retrieved in 0 size chunks doesn't make a lot of sense. Checking the source I have seen it can be fixed in SolrIndexSearcher. At the end of the function getDocListC it's coded: if (maxDocRequested queryResultWindowSize) { supersetMaxDoc=queryResultWindowSize; } else { supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize + 1)*queryResultWindowSize; if (supersetMaxDoc 0) supersetMaxDoc=maxDocRequested; } I have sorted it oud doing (just addin parenthesis): if (maxDocRequested queryResultWindowSize) { supersetMaxDoc=queryResultWindowSize; } else { supersetMaxDoc = ((maxDocRequested -1)/(queryResultWindowSize + 1))*queryResultWindowSize; if (supersetMaxDoc 0) supersetMaxDoc=maxDocRequested; } I have seen this is happening in a recent trunk. Is my fix correct? The +1 really needs to be after the divide (we're rounding up). If a fix is needed, I imagine it would be at the time that config parameter is read... if it's less than or equal to 0, then set it to 1. -Yonik http://www.lucidimagination.com
Re: Selective Searches Based on User Identity
I also work with the FAST Enterprise Search engine and this is exactly how their Security Access Module works. They actually use a modified base-32 encoded value for indexing, but that is because they don't have the luxury of untokenized/un-processed String fields like Solr. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 12:26 PM, Terence Gannon wrote: Paul -- thanks for the reply, I appreciate it. That's a very practical approach, and is worth taking a closer look at. Actually, taking your idea one step further, perhaps three fields; 1) ownerUid (uid of the document's owner) 2) grantedUid (uid of users who have been granted access), and 3) deniedUid (uid of users specifically denied access to the document). These fields, coupled with some business rules around how they were populated should cover off all possibilities I think. Access to the Solr instance would have to be tightly controlled, but that's something that should be done anyway. You sure wouldn't want end users preparing their own XML and throwing it at Solr -- it would be pretty easy to figure out how to get around the access/denied fields and get at stuff the owner didn't intend. This approach mimics to some degree what is being done in the operating system, but it's still elegant and provides the level of control required. Anybody else have any thoughts in this regard? Has anybody implemented anything similar, and if so, how did it work? Thanks, and best regards... Terence
Re: Selective Searches Based on User Identity
The only downside would be that you would have to update a document anytime a user was granted or denied access. You would have to query before the update to get the current values for grantedUID and deniedUID, remove/add values, and update the index. If you don't have a lot of changes in the system that wouldn't be a big deal, but if a lot of changes are happening throughout the day you might have to queue requests and batch them. -Jay On Tue, May 12, 2009 at 1:05 PM, Matt Weber m...@mattweber.org wrote: I also work with the FAST Enterprise Search engine and this is exactly how their Security Access Module works. They actually use a modified base-32 encoded value for indexing, but that is because they don't have the luxury of untokenized/un-processed String fields like Solr. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 12:26 PM, Terence Gannon wrote: Paul -- thanks for the reply, I appreciate it. That's a very practical approach, and is worth taking a closer look at. Actually, taking your idea one step further, perhaps three fields; 1) ownerUid (uid of the document's owner) 2) grantedUid (uid of users who have been granted access), and 3) deniedUid (uid of users specifically denied access to the document). These fields, coupled with some business rules around how they were populated should cover off all possibilities I think. Access to the Solr instance would have to be tightly controlled, but that's something that should be done anyway. You sure wouldn't want end users preparing their own XML and throwing it at Solr -- it would be pretty easy to figure out how to get around the access/denied fields and get at stuff the owner didn't intend. This approach mimics to some degree what is being done in the operating system, but it's still elegant and provides the level of control required. Anybody else have any thoughts in this regard? Has anybody implemented anything similar, and if so, how did it work? Thanks, and best regards... Terence
RE: Selective Searches Based on User Identity
Thanks for the tip. I went to their website (www.fastsearch.com), and got as far as the second line, top left 'A Microsoft Subsidiary'...at which point, hopes of it being another open source solution quickly faded. ;-) Seriously, though, it looks like an interesting product, but open source is a mandatory requirement for my particular application. But the fact they implemented this functionality would seem to support that it's a valid requirement, and I'll keep plugging away on it. Thank you very much for bringing FAST to my attention...I appreciate it! Best regards... Terence -Original Message- From: Matt Weber [mailto:m...@mattweber.org] Sent: May 12, 2009 14:06 To: solr-user@lucene.apache.org Subject: Re: Selective Searches Based on User Identity I also work with the FAST Enterprise Search engine and this is exactly how their Security Access Module works. They actually use a modified base-32 encoded value for indexing, but that is because they don't have the luxury of untokenized/un-processed String fields like Solr. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com
Re: Selective Searches Based on User Identity
Here is a good presentation on search security from the Infonortics Search Conference that was held a few weeks ago. http://www.infonortics.com/searchengines/sh09/slides/kehoe.pdf The approach you are using is called early-binding. As Jay mentioned, one of the downsides is updating the documents each time you have an ACL change. You could use the late-binding approach that checks each result after the query but before you display to the user. I don't recommend this approach because it will strain your security infrastructure because you will need to check if the user can access each result. Good luck. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 1:21 PM, Jay Hill wrote: The only downside would be that you would have to update a document anytime a user was granted or denied access. You would have to query before the update to get the current values for grantedUID and deniedUID, remove/add values, and update the index. If you don't have a lot of changes in the system that wouldn't be a big deal, but if a lot of changes are happening throughout the day you might have to queue requests and batch them. -Jay On Tue, May 12, 2009 at 1:05 PM, Matt Weber m...@mattweber.org wrote: I also work with the FAST Enterprise Search engine and this is exactly how their Security Access Module works. They actually use a modified base-32 encoded value for indexing, but that is because they don't have the luxury of untokenized/un-processed String fields like Solr. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 12, 2009, at 12:26 PM, Terence Gannon wrote: Paul -- thanks for the reply, I appreciate it. That's a very practical approach, and is worth taking a closer look at. Actually, taking your idea one step further, perhaps three fields; 1) ownerUid (uid of the document's owner) 2) grantedUid (uid of users who have been granted access), and 3) deniedUid (uid of users specifically denied access to the document). These fields, coupled with some business rules around how they were populated should cover off all possibilities I think. Access to the Solr instance would have to be tightly controlled, but that's something that should be done anyway. You sure wouldn't want end users preparing their own XML and throwing it at Solr -- it would be pretty easy to figure out how to get around the access/denied fields and get at stuff the owner didn't intend. This approach mimics to some degree what is being done in the operating system, but it's still elegant and provides the level of control required. Anybody else have any thoughts in this regard? Has anybody implemented anything similar, and if so, how did it work? Thanks, and best regards... Terence
Who is running 1.4 nightly in production?
We're planning our move to 1.4, and want to run one of our production servers with the new code. Just to feel better about it, is anyone else running 1.4 in production? I'm building 2009-05-11 right now. wuner
Re: Who is running 1.4 nightly in production?
We're using 1.4-dev 749558:749756M that we built on 2009-03-03 13:10:05 for our master/slave production environment using the Java Replication code. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On May 12, 2009, at 2:02 PM, Walter Underwood wrote: We're planning our move to 1.4, and want to run one of our production servers with the new code. Just to feel better about it, is anyone else running 1.4 in production? I'm building 2009-05-11 right now. wuner
camel-casing and dismax troubles
hi all :) I'm having trouble with camel-cased query strings and the dismax handler. a user query LeAnn Rimes isn't matching the indexed term Leann Rimes even though both are lower-cased in the end. furthermore, the analysis tool shows a match. the debug query looks like parsedquery:+((DisjunctionMaxQuery((search-en:\(leann le) ann\)) DisjunctionMaxQuery((search-en:rimes)))~2) (), I have a feeling it's due to how the broken up tokens are added back into the token stream with PreserveOriginal, and some strange interaction between that order and dismax, but I'm not entirely sure. configs follow. thoughts appreciated. --Geoff fieldType name=search-en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=false words=stopwords-en.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=stopwords-en.txt/ /analyzer /fieldType
Re: how to manually add data to indexes generated by nutch-1.0 using solr
Tried to add a new record using curl http://localhost:8983/solr/update -H Content-Type: text/xml --data-binary 'add doc boost=2.5 field name=segment20090512170318/field field name=digest86937aaee8e748ac3007ed8b66477624/field field name=boost0.21189615/field field name=urltest.com/field field name=titletest test/field field name=tstamp 20090513003210909/field /doc /add' I get ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime71/int/lst /response and added records are not found in the search. Any ideas what went wrong? Thanks. Alex. -Original Message- From: alx...@aim.com To: solr-user@lucene.apache.org Sent: Mon, 11 May 2009 12:14 pm Subject: how to manually add data to indexes generated by nutch-1.0 using solr Hello, I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to? index a few files also. But I know keywords for those files and their? locations. I need to add them manually. I took a look to two tutorials on the wiki, but did not find any info about this issue. Is there a tutorial on, step by step procedure of adding data to? nutch index using solr? manually? Thanks in advance. Alex.
Re: Who is running 1.4 nightly in production?
We run a not too distant trunk (1.4, probably a month or so ago) version of Solr on LucidFind at http://www.lucidimagination.com/search Erik On May 12, 2009, at 5:02 PM, Walter Underwood wrote: We're planning our move to 1.4, and want to run one of our production servers with the new code. Just to feel better about it, is anyone else running 1.4 in production? I'm building 2009-05-11 right now. wuner
Re: how to manually add data to indexes generated by nutch-1.0 using solr
send a commit/ request afterwards, or you can add ?commit=true to the /update request with the adds. Erik On May 12, 2009, at 8:57 PM, alx...@aim.com wrote: Tried to add a new record using curl http://localhost:8983/solr/update -H Content-Type: text/xml -- data-binary 'add doc boost=2.5 field name=segment20090512170318/field field name=digest86937aaee8e748ac3007ed8b66477624/field field name=boost0.21189615/field field name=urltest.com/field field name=titletest test/field field name=tstamp 20090513003210909/field /doc /add' I get ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime71/int/lst /response and added records are not found in the search. Any ideas what went wrong? Thanks. Alex. -Original Message- From: alx...@aim.com To: solr-user@lucene.apache.org Sent: Mon, 11 May 2009 12:14 pm Subject: how to manually add data to indexes generated by nutch-1.0 using solr Hello, I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to? index a few files also. But I know keywords for those files and their? locations. I need to add them manually. I took a look to two tutorials on the wiki, but did not find any info about this issue. Is there a tutorial on, step by step procedure of adding data to? nutch index using solr? manually? Thanks in advance. Alex.
RE: Selective Searches Based on User Identity
In reply to both Matt and Jay's comments, the particular situation I'm dealing with is one where rights will change relatively little once they are established. Typically a document will be loaded and indexed, and a decision will be made on sharing that more-or-less immediately. It might change a couple of times after that, but that will be it. So early-binding seems like the better option. Thanks to both of you for your suggestions and help. Terence PS. I wish I had known about that conference...looks like it would have been very helpful to me right now! -Original Message- From: Matt Weber [mailto:m...@mattweber.org] Sent: May 12, 2009 14:41 To: solr-user@lucene.apache.org Subject: Re: Selective Searches Based on User Identity Here is a good presentation on search security from the Infonortics Search Conference that was held a few weeks ago. http://www.infonortics.com/searchengines/sh09/slides/kehoe.pdf The approach you are using is called early-binding. As Jay mentioned, one of the downsides is updating the documents each time you have an ACL change. You could use the late-binding approach that checks each result after the query but before you display to the user. I don't recommend this approach because it will strain your security infrastructure because you will need to check if the user can access each result. Good luck. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com
Re: Replication master+slave
I was looking at the same problem, and had a discussion with Noble. You can use a hack to achieve what you want, see https://issues.apache.org/jira/browse/SOLR-1154 Thanks, Jianhan On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot btal...@aeriagames.comwrote: So how are people managing solrconfig.xml files which are largely the same other than differences for replication? I don't think it's a good thing to maintain two copies of the same file and I'd like to avoid that. Maybe enabling the XInclude feature in DocumentBuilders would make it possible to modularize configuration files to make this possible? http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware%28boolean%29 -Bryan On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote: On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot btal...@aeriagames.com wrote: For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both the master and slave configuration lists need to be present inside the ReplicationHandler requestHandler in the solrconfig.xml. What does this mean? Does the core then poll itself for updates? No. This type of configuration is meant for repeaters. Suppose there are slaves in multiple data-centers (say data center A and B). There is always a single master (say in A). One of the slaves in B is used as a master for the other slaves in B. Therefore, this one slave in B is both a master as well as the slave. I'd like to have a single set of configuration files that are shared by masters and slaves and avoid duplicating configuration details in multiple files (one for master and one for slave) to ease management and failover. Is this possible? You wouldn't want the master to be a slave. So I guess you'd need to have a separate file. Also, it needs to be a separate file so that the slave does not become a master when the solrconfig.xml is replicated. When I attempt to setup a multi server master-slave configuration and include both master and slave replication configuration options, I into some problems. I'm running a nightly build from May 7. Not sure what happened. Is that the url for this solr (meaning same solr url is master and slave of itself)? If yes, that is not a valid configuration. -- Regards, Shalin Shekhar Mangar.
RE: Solr Loggin issue
I have only one log4j.properties file in classpath and even if i configure for the particular package where the solr exception would come then also the same issue. I had removed the logger for my application and using only for solr logging. ~Sagar Date: Tue, 12 May 2009 09:59:01 -0700 Subject: Re: Solr Loggin issue From: jayallenh...@gmail.com To: solr-user@lucene.apache.org Usually that means there is another log4j.properties or log4j.xml file in your classpath that is being found before the one you are intending to use. Check your classpath for other versions of these files. -Jay On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade sagar.khetk...@hotmail.comwrote: Hi, I have solr implemented in multi-core scenario and also implemented solr-560-slf4j.patch for implementing the logging. But the problem I am facing is that the logs are going to the stdout.log file not the log file that I have mentioned in the log4j.properties file. Can anybody give me work round to make logs go into the logger mentioned in log4j.properties file. Thanks in advance. Regards, Sagar Khetkade _ Live Search extreme As India feels the heat of poll season, get all the info you need on the MSN News Aggregator http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx _ Live Search extreme As India feels the heat of poll season, get all the info you need on the MSN News Aggregator http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx