Re: Boosting non synonyms result
I do it the same but do not use the Dismax query which is a lot too unflexible. In CurrikiSolr, I have my own QueryComponent which does all sorts of query expansion: - it expands a simple term query to a query for the text in the stemmed variant and in the unstemmed variant with more boost - it pre-parses to make sure that phrase-queries remain phrase queries and thus become unstemmed queries - it converts prefix queries to queries in the unstemmed field only - it uses parameters (used in the advanced search) to add queries (e.g. only resources with that topic) - it applies some rights protections - it would be the place to expand along the multiple languages if indexing each language in a separate field as I would do it - it applies some application specific quality boosting (higher-ranked resources go higher) I find that such a component is kind of best practice because it makes a server that can apply business logic (independently of hackers in the client), and gives me java to perform deep query processing instead of javascript for fragile string processing. I guess I could find a way to extend intelligently, but I have not found it. paul Le 18 mai 2011 à 00:52, Jonathan Rochkind a écrit : I do it with two fields exactly how you say, but then use dismax to boost the non-synonom-ed field higher than the synonym-ed field. That is a lot easier than trying to use a function query, which I'm not sure how to do either. On 5/17/2011 6:45 PM, Dmitriy Shvadskiy wrote: Hello, Is there a way to boost the result that is an exact match as oppose to synonym match when using query time synonyms? Given the query John Smith and synonyms Jonathan,Jonathan,John,Jon,Nat,Nathan I'd like result containing John Smith to be ranked higher then Jonathan Smith. My thinking was to do it by defining 2 fields: 1 with query time synonyms and 1 without and sort by a function query of a non-synonym field. Is it even possible? I can't quite figure out the syntax for this. I'm using Solr 3.1. Thanks, Dmitriy
solr/home property seting
Hi all, I am wondering how could I set a path for solr/home property. Our solr home is inside the solr.war, so I don't want a absolute path(will deploy to different boxes). Currently I hard code a relative path as solr/home property in web.xml. !-- People who want to hardcode their Solr Home directly into the WAR File can set the JNDI property here... -- env-entry env-entry-namesolr/home/env-entry-name env-entry-value*../webapps/solr/home*/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry But in this way, I have to start tomcat under bin/. Seems the root path here is start path. How could I set solr/home property to get ride of tomcat start path. Thanks Kun
Updating a multi-valued field
I've been using ExternalFileField for external scoring so far, so that the external field gets updated and not deleted and added Now, I have a field which is multivalued. I cannot use ExternalFileField as I need this field in the suggest component too. Is there something other than ExternalFileField which will help me in doing this? Thanks!
Re: solr/home property seting
Why you have putted your solr home inside the Tomcat's webapps directory. This is not the correct way.Put your Solr home somewhere other place outside the servlet container and set your solr/home path accordingly as env-entry env-entry-namesolr/home/env-entry-name env-entry-value/opt/solr/home/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/solr-home-property-seting-tp2956003p2956047.html Sent from the Solr - User mailing list archive at Nabble.com.
questions about request logging
Dear list, the poll about solr logging directed my interest to my log files. Right out of the box the jetty request logs have all information needed for the GET requests but only the path of POST requests. Is it possible to have the POST requests logged the same way the GET requests are logged? If not, may be with a different logger? The console is redirected to a file and there are both (GET and POST) requests logged but it is mixed with all kind of log messages and the request logs are not usable with webalizer or other log analyzer. Is it somehow possible to get a useful log file from the console output? Regards Bernd
Re: filter cache and negative filter query
Mmm... I had wondered whether solr reused filters this way (not having both the positive and negative versions) and I'm glad to see it does indeed reuse them. What I don't like is that it systematically uses the positive version. Sometimes the negative version will give many less results (for example, in some cases I filter by documents not having a given field, and there are very few of them). I think it would be much better that solr performed exactly the query requested and, if there's more than a 50% of documents that match the query, then it just stored the negated one. I think (without knowing almost at all how things are implemented) this shouldn't be a problem. Is there any place where you can post a suggestion of improvement? :) Anyway, it would be very useful to know exactly how the current versions work (I think the info in the message I'm answering is about version 1.1 and could have changed), because knowing it, one can sometimes manage to write, for example, a positive query that in fact returns the negative results. As a simple example, I believe that, for a boolean field, -field:true is exactly the same as +field:false, but the former is a negative query and the latter is a positive one. So, knowing the exact behaviour of solr can help you write optimized filters when you know that one version will give many less hits than the other. El 18/05/2011, a las 00:26, Yonik Seeley escribió: On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma markus.jel...@openindex.io wrote: I'm not sure. The filter cache uses your filter as a key and a negation is a different key. You can check this easily in a controlled environment by issueing these queries and watching the filter cache statistics. Gotta hate crossing emails ;-) Anyway, this goes back to Solr 1.1 5. SOLR-80: Negative queries are now allowed everywhere. Negative queries are generated and cached as their positive counterpart, speeding generation and generally resulting in smaller sets to cache. Set intersections in SolrIndexSearcher are more efficient, starting with the smallest positive set, subtracting all negative sets, then intersecting with all other positive sets. (yonik) -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco If I have a query with a filter query such as : q=artfq=history and then run a second query q=artfq=-history, will Solr realize that it can use the cached results of the previous filter query history (in the filter cache) or will it not realize this and have to actually do a second filter query against the index for not history? Tom
Re: How to test Solr Integartion - how to get EmbeddedSolrServer?
Thinking more about it, I can solve my immediate problem by just copy-pasting the classes I need into my own project packages (KISS like herehttps://github.com/Filirom1/solr-test-exemple ). I'd however suggest to refactor Solr code structure to be much more defaults-compliant making it easier for external developers to understand, and hopefully easier to maintain for committers (with fewer special-needs configurations). I've done some of those refactorings on my local copy of Solr and would be glad to contribute. For this particular problem the KISS solution would be to create yet one more module for Tests which depend on Solr Core and on the Test Framework. The org burden of that extra module, versus the ease of building configuration, I believe, outweights. On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote: Solr Core should declare a test dependency on Solr Test Framework. I agree: - Solr Core should have a test-scope dependency on Solr Test Framework. - Solr Test Framework should have a compile-scope dependency on Solr Core. But Maven views this as a circular dependency. I've seen, but adding it with scope test /scope works. The logic: the src is compiled first and then re-used (I'm assuming maven does something smart about not including the full jar). Not quite. I've tried a demo and the reactor complains. I'll try to see if maven could become 'smarter', or if the 2-build phase solution will work. The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in the graph org.apache:DummyCore:1.0-SNAPSHOT -- com.mysimpatico:TestFramework:1.0-SNAPSHOT -- org.apache:DummyCore:1.0-SNAPSHOT - [Help 1] The workaround: Solr Core includes the source of Solr Test Framework as part of its test source code. It's not pretty, but it works. I'd be happy to entertain other (functional) approaches. In dp4j.com pom.xml I build in 2 phases to compile with the same annotations in the project itself (but i don't think we need that here) Steve -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to list/see all the indexed terms of a particular field in a document?
Hi, I'm using Apache Solr v3.1. How do I list/get to see all the indexed terms of a particular field in a document (by passing Unique Key ID of the document)? For example, I've the following field definition in schema.xml: field name=mydocumentid type=string indexed=true stored=true required=true / field name=mytextcontent type=text indexed=true stored=true required=true / In this case, I expect/want to list/see all the indexed terms of a particular document (mydocumentid:x) for the document field mytextcontent. Regards, Gnanam
Re: How to list/see all the indexed terms of a particular field in a document?
ant luke? On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I list/get to see all the indexed terms of a particular field in a document (by passing Unique Key ID of the document)? For example, I've the following field definition in schema.xml: field name=mydocumentid type=string indexed=true stored=true required=true / field name=mytextcontent type=text indexed=true stored=true required=true / In this case, I expect/want to list/see all the indexed terms of a particular document (mydocumentid:x) for the document field mytextcontent. Regards, Gnanam -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Using solandra
I've recently switched from solr+cassandra to solandra. When I try to run solandra using java -jar start.jar in solandra-app, it gives me the following error: java.lang.ExceptionInInitializerError at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249) at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.RuntimeException: *Couldn't figure out log4j configuration.* at org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75) How exactly do I configure the log4j configuration? Karan
Re: Using solandra
Karan, following the Readme (https://github.com/tjake/Solandra#readme) it's: From the Solandra base directory: $ mkdir /tmp/cassandra-data $ ant $ cd solandra-app $ ./start-solandra.sh Regards Stefan On Wed, May 18, 2011 at 12:40 PM, karanveer singh karan.korn...@gmail.com wrote: I've recently switched from solr+cassandra to solandra. When I try to run solandra using java -jar start.jar in solandra-app, it gives me the following error: java.lang.ExceptionInInitializerError at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249) at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.RuntimeException: *Couldn't figure out log4j configuration.* at org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75) How exactly do I configure the log4j configuration? Karan
sorting on date field in facet query
Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: Using solandra
Thanks Stefan! I got it started. Also, is there a way to import xml documents? When I run 2-import-data.sh with only xml documents in the data directory, it gives me the following : Loading data to solandra, note: this importer uses a slow xml parser Exception in thread main java.lang.RuntimeException: Directory doesn't contain sgml files! at org.apache.solr.solrjs.sgml.reuters.ReutersService.readDirectory(ReutersService.java:207) at org.apache.solr.solrjs.sgml.reuters.ReutersService.main(ReutersService.java:64) Data loaded, now open ./website/index.html in your favorite browser! On Wed, May 18, 2011 at 4:20 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Karan, following the Readme (https://github.com/tjake/Solandra#readme) it's: From the Solandra base directory: $ mkdir /tmp/cassandra-data $ ant $ cd solandra-app $ ./start-solandra.sh Regards Stefan On Wed, May 18, 2011 at 12:40 PM, karanveer singh karan.korn...@gmail.com wrote: I've recently switched from solr+cassandra to solandra. When I try to run solandra using java -jar start.jar in solandra-app, it gives me the following error: java.lang.ExceptionInInitializerError at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249) at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.RuntimeException: *Couldn't figure out log4j configuration.* at org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75) How exactly do I configure the log4j configuration? Karan
RE: How to list/see all the indexed terms of a particular field in a document?
So this cannot be queried/listed using Apache Solr? -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Wednesday, May 18, 2011 3:36 PM To: solr-user@lucene.apache.org; gna...@zoniac.com Subject: Re: How to list/see all the indexed terms of a particular field in a document? ant luke? On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I list/get to see all the indexed terms of a particular field in a document (by passing Unique Key ID of the document)? For example, I've the following field definition in schema.xml: field name=mydocumentid type=string indexed=true stored=true required=true / field name=mytextcontent type=text indexed=true stored=true required=true / In this case, I expect/want to list/see all the indexed terms of a particular document (mydocumentid:x) for the document field mytextcontent. Regards, Gnanam -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ? L(LON*) ? ?x. (x ? MyInbox ? Acknowledges(x, this) ? time(x) Now + 48h) ? resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ?x. x ? MyInbox ? from(x) ? MySafeSenderList ? (?y. y ? subject(x) ? y ? L(-[a-z]+[0-9]X)).
Re: Using solandra
Karan, this data-import script is made especially for importing the demo-data. To index xml documents (like you'd do it normally w/ solr) use for example http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/exampledocs/post.sh - and don't forget to adjust the URL, according to your solandra setup. Regards Stefan On Wed, May 18, 2011 at 1:25 PM, karanveer singh karan.korn...@gmail.com wrote: Thanks Stefan! I got it started. Also, is there a way to import xml documents? When I run 2-import-data.sh with only xml documents in the data directory, it gives me the following : Loading data to solandra, note: this importer uses a slow xml parser Exception in thread main java.lang.RuntimeException: Directory doesn't contain sgml files! at org.apache.solr.solrjs.sgml.reuters.ReutersService.readDirectory(ReutersService.java:207) at org.apache.solr.solrjs.sgml.reuters.ReutersService.main(ReutersService.java:64) Data loaded, now open ./website/index.html in your favorite browser! On Wed, May 18, 2011 at 4:20 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Karan, following the Readme (https://github.com/tjake/Solandra#readme) it's: From the Solandra base directory: $ mkdir /tmp/cassandra-data $ ant $ cd solandra-app $ ./start-solandra.sh Regards Stefan On Wed, May 18, 2011 at 12:40 PM, karanveer singh karan.korn...@gmail.com wrote: I've recently switched from solr+cassandra to solandra. When I try to run solandra using java -jar start.jar in solandra-app, it gives me the following error: java.lang.ExceptionInInitializerError at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249) at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.RuntimeException: *Couldn't figure out log4j configuration.* at org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75) How exactly do I configure the log4j configuration? Karan
Re: How to list/see all the indexed terms of a particular field in a document?
Gnanam, have a look to http://wiki.apache.org/solr/LukeRequestHandler Regards Stefan On Wed, May 18, 2011 at 1:30 PM, Gnanakumar gna...@zoniac.com wrote: So this cannot be queried/listed using Apache Solr? -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Wednesday, May 18, 2011 3:36 PM To: solr-user@lucene.apache.org; gna...@zoniac.com Subject: Re: How to list/see all the indexed terms of a particular field in a document? ant luke? On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I list/get to see all the indexed terms of a particular field in a document (by passing Unique Key ID of the document)? For example, I've the following field definition in schema.xml: field name=mydocumentid type=string indexed=true stored=true required=true / field name=mytextcontent type=text indexed=true stored=true required=true / In this case, I expect/want to list/see all the indexed terms of a particular document (mydocumentid:x) for the document field mytextcontent. Regards, Gnanam -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ? L(LON*) ? ?x. (x ? MyInbox ? Acknowledges(x, this) ? time(x) Now + 48h) ? resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ?x. x ? MyInbox ? from(x) ? MySafeSenderList ? (?y. y ? subject(x) ? y ? L(-[a-z]+[0-9]X)).
how to work cache and improve performance phrase query included wildcard
Hi, all I have two questions. First, I'm wondering how filterCache, queryResultCache, documentCache are applied. After searching query1 OR query2 OR query3 ... , I searched query0 OR query2 OR query3 ... . Just query1 and query0 are difference. But query time was not fast. When are the caches applied? Second, I have 5 or more phrase queries included wildcard per query such as query1* query2*~2 OR query3* query4*~2 ... In the worst case, phrase queries included wildcard in one query are more than 30. QTime is more than 60 second. Please give any idea to improve performance. I have 2.5 million full text index. That is running 10 shards on 1 tomcat. Thanks, Jason -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-work-cache-and-improve-performance-phrase-query-included-wildcard-tp2956671p2956671.html Sent from the Solr - User mailing list archive at Nabble.com.
I need to improve highlighting
Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text − str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: How to test Solr Integartion - how to get EmbeddedSolrServer?
You've probably seen this page: http://wiki.apache.org/solr/HowToContribute, but here it is for reference Go ahead and open a JIRA at https://issues.apache.org/jira/browse/SOLR (you need to create an account) and attach your changes as a patch. That gets it into the system and folks can start commenting on what they think the implications are. One of the committers needs to pick it up, but you can prompt G... Yonik's law of patches reads: A half-baked patch in Jira, with no documentation, no tests and no backwards compatibility is better than no patch at all. So don't worry about a completely polished patch for the first cut, it's often helpful for people to see the early stages to help steer the effort. Best Erick On Wed, May 18, 2011 at 5:41 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Thinking more about it, I can solve my immediate problem by just copy-pasting the classes I need into my own project packages (KISS like herehttps://github.com/Filirom1/solr-test-exemple ). I'd however suggest to refactor Solr code structure to be much more defaults-compliant making it easier for external developers to understand, and hopefully easier to maintain for committers (with fewer special-needs configurations). I've done some of those refactorings on my local copy of Solr and would be glad to contribute. For this particular problem the KISS solution would be to create yet one more module for Tests which depend on Solr Core and on the Test Framework. The org burden of that extra module, versus the ease of building configuration, I believe, outweights. On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote: Solr Core should declare a test dependency on Solr Test Framework. I agree: - Solr Core should have a test-scope dependency on Solr Test Framework. - Solr Test Framework should have a compile-scope dependency on Solr Core. But Maven views this as a circular dependency. I've seen, but adding it with scope test /scope works. The logic: the src is compiled first and then re-used (I'm assuming maven does something smart about not including the full jar). Not quite. I've tried a demo and the reactor complains. I'll try to see if maven could become 'smarter', or if the 2-build phase solution will work. The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in the graph org.apache:DummyCore:1.0-SNAPSHOT -- com.mysimpatico:TestFramework:1.0-SNAPSHOT -- org.apache:DummyCore:1.0-SNAPSHOT - [Help 1] The workaround: Solr Core includes the source of Solr Test Framework as part of its test source code. It's not pretty, but it works. I'd be happy to entertain other (functional) approaches. In dp4j.com pom.xml I build in 2 phases to compile with the same annotations in the project itself (but i don't think we need that here) Steve -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the
Re: sorting on date field in facet query
Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information? Best Erick On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote: Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: I need to improve highlighting
Bryan, on Q2 - what about using xpath like 'str/em' ? Regards Stefan On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text - str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: how to work cache and improve performance phrase query included wildcard
See below: On Wed, May 18, 2011 at 8:15 AM, Jason, Kim hialo...@gmail.com wrote: Hi, all I have two questions. First, I'm wondering how filterCache, queryResultCache, documentCache are applied. After searching query1 OR query2 OR query3 ... , I searched query0 OR query2 OR query3 ... . Just query1 and query0 are difference. But query time was not fast. When are the caches applied? Caches don't really count here. You're not using filter queries so filterCache isn't germane. documentCache is only for holding the document read off disk, it probably isn't doing much in your example that would impact differences in search time unless you're returning massive numbers of documents. queryResultCache isn't getting re-used. Think of this as a list of document IDs keyed by the *entire* query. by making any changes to the query you're not going to use the cache. To understand this, consider that the clauses aren't really separate. Any additional clause could easily change the scoring of a document that matched both queries. So re-using the cache on a by clause basis wouldn't produce correct results. In other words, caches aren't going to help you here. Second, I have 5 or more phrase queries included wildcard per query such as query1* query2*~2 OR query3* query4*~2 ... In the worst case, phrase queries included wildcard in one query are more than 30. QTime is more than 60 second. Can we see the results of attaching debugQuery=on to the URL? your pseudo-code may well be hiding the issue. We don't know what query parser you're using. Wildcards aren't usually analyzed for phrase queries for instance, so on the face of it there's not much that can be said... Additionally, the field type and field definitions from your schema.xml would be helpful for the fields you're searching on. Best Erick Please give any idea to improve performance. I have 2.5 million full text index. That is running 10 shards on 1 tomcat. Thanks, Jason -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-work-cache-and-improve-performance-phrase-query-included-wildcard-tp2956671p2956671.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: I need to improve highlighting
Bryan, on Q2 - what about using xpath like 'str/em' ? How do I do that? The highlighting result, at least in the solr installation I have (3. something) returns the em as escaped markup. Is there an xpath parameter or configuration I can set for highlighting, or a way to change the em elements to be actual elements (hl.fomatter maybe?) Thanks, Bryan Rasmussen On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text - str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: I need to improve highlighting
Just checking, but have you tried setting hl.fragsize=very large number as suggested here: http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? If that's not the problem, please show us the results of attaching debugQuery=on to the request, that may shed some light on the problem. Best Erick On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text - str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: I need to improve highlighting
yeah but you just got me to check again, what I thought was ignoring my setting of hl.fragsize and always using the default ended up just returning a smaller field higher ranked, so when I set it to 1000 and saw the same as what I saw with 100 was the just the off chance that there was only 100 to see in the first 10 results. funny. thanks, Bryan Rasmussen On Wed, May 18, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.com wrote: Just checking, but have you tried setting hl.fragsize=very large number as suggested here: http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? If that's not the problem, please show us the results of attaching debugQuery=on to the request, that may shed some light on the problem. Best Erick On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen rasmussen.br...@gmail.com wrote: Hi, If I do a search http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in the lst name=highlighting subtree I get arr name=all_text - str Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige /str /arr /lst What I need to do is to either 1. Return all of all_text which should be possible by setting hl.fragsize=0 but I still never go beyond the default for the field (I can go less than 100 but not more) 2. Get a count of number of highlighted instances(preferable) or return each highlighted text in a separate str element - so strkongeriget/strstrkongeriget/str thanks, Bryan Rasmussen
Re: Anyone having these Replication issues as well?
Thanks Markus, for your patience with getting the response in as well the comments. This is my Dev environment, I'm actually going to be setting up a new master-slave configuration in a different environment today. I'll see if it's environment specific or not. One thing I didn't mention, wasn't sure it was germane, is that these servers are in Amazon EC2. Also, the master is currently on a 32 bit OS the slaves are on 64 bit OS's. Just the order in which the servers are getting upgraded in dev. The master has AutoCommit turned on at 30 second intervals. Even if nothing is getting indexed, could an AutoCommit occurring during a replication request cause a failed replication? Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-having-these-Replication-issues-as-well-tp2954365p2957127.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: I need an available solr lucene consultant
I am interested in hearing more about this opportunity. Feel free to contact me at b...@csrinstitute.net. Thanks Bill -- View this message in context: http://lucene.472066.n3.nabble.com/I-need-an-available-solr-lucene-consultant-tp2954023p2957137.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact match
There's a JIRA issue assigned to this feature: https://issues.apache.org/jira/browse/SOLR-1980 However, it's not yet implemented. Anyone? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 17. mai 2011, at 15.51, Alex Grilo wrote: Hi, Can I make a query that returns only exact match or do I have to change the fields to achieve that? Thanks in advance Alex Grilo
Does every Solr request-response require a running server?
Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: UIMA analysisEngine path
2011/5/17 chamara chama...@gmail.com Hi My solr version is 3.1.0. I actually figured out what my problem was. I used the guide https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt and it seems that i have placed the code snippet inside an another xml element not under config. One more thing is that you are using Solr 3.1.0 but reading README from trunk (4.0-SNAPSHOT), you should use this one instead: https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/README.txt Will the UIMA work with solr version 1.4.1 as well? the UpdateRequestProcessorChain API has changed from 1.4.1 to 3.1.0 so, although it should be easy to back port, it's not compatible with Solr 1.4.1 out of the box. Tommaso Thanks again On Tue, May 17, 2011 at 12:13 PM, Tommaso Teofili [via Lucene] ml-node+2952043-2093755785-399...@n3.nabble.com wrote: Hi again Chamara, 2011/5/17 chamara [hidden email] http://user/SendEmail.jtp?type=nodenode=2952043i=0 Thanks Tommaso, yes this occurred after copying the .jar files to the lib folder. When i do not copy them from contrib/uima/lib and have the solrconfig.xml to point to those libs i get the following error. I am a bit confused why a class path was chosen to get the analysis engine descriptor . I think it'd be nice if you could tell which version of Solr you're using, how you configured the Solr-UIMA module in solrconfig.xml. The error is prompted when the /update request handler calls this looks like it is related to the class path(/org/apache/uima/desc/). SEVERE: Error in xpath:java.lang.RuntimeException: solrconfig.xml missing /confi g/uimaConfig/analysisEngine This seems to be related to a missing /config/uimaConfig/analysisEngine element inside solrconfig.xml. Regards, Tommaso On Mon, May 16, 2011 at 6:19 PM, Tommaso Teofili [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=2952043i=1 wrote: The error you pasted doesn't seem to be related to a (class)path issue but more likely to be related to a Solr instance at 1.4.1/3.1.0 and Solr-UIMA module at 3.1.0/4.0-SNAPSHOT(trunk); it seems that the error raises from UpdateRequestProcessorFactory API changed. Hope this helps, Tommaso Il giorno 16/mag/2011, alle ore 18.54, chamara ha scritto: Hi Tommaso, Thanks for the quick reply. I had copied the lib files and followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation. However i get this error. The AnalysisEngine has the default class path which is /org/apache/uima/desc/. SEVERE: org.apache.solr.common.SolrException: Error Instantiating UpdateRequestP rocessorFactory, org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory Regards, Chamara On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=2948866i=0 wrote: Hello, if you want to take the descriptor from a jar, provided that you configured the jar inside a lib element in solrconfig, then you just need to write the correct classpath in the analysisEngine element. For example if your descriptor resides in com/something/desc/ path inside the jar then you should set the analysisEngine element as /com/something/desc/descriptorname.xml If you instead need to get the descriptor from filesystem try the patch in SOLR-2501 [1]. Hope this helps, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/13 chamara [hidden email] http://user/SendEmail.jtp?type=nodenode=2946920i=0 Hi, Is this code line 57 needs to be changed to the location where the jar files(library files) resides? URL url = this.getClass().getResource(location of the jar files); I did change it but no luck so far. Let me know what i am doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: -- --- Chamara -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added
Re: lucene parser, negative OR operands
On 5/17/2011 8:00 PM, Yonik Seeley wrote: This doesn't have to do with Solr's support of pure-negative top-level queries, but does have to do with a long standing confusion of how the lucene queryparser works with some of the operators (i.e. not really boolean logic). In a Lucene BooleanQuery, clauses are mandatory, optional, or prohibited. -foo OR -bar actually parses to a boolean query with two prohibited clauses... essentially the same as -foo AND -bar. You can see this by adding debugQuery=true to the request. Thanks Yonik. I recall hearing about this before, but was vague on the details, thanks for supplying some and refreshing my memory. So I guess there is no such thing as an optional prohibited clause. Which is what makes -one OR -two the same thing as -one AND -two. Actually, yeah, an optional prohibited clause doesn't reallly even make sense. Hmm. If I want to understand more about how the lucene query parser does it's thing, can anyone suggest the source files I should be looking at? If I really do want actual boolean logic behavior, what are my options? I guess one is trying to write my own query parser. Hmm, for that particular query, what about using parens to force a sub-query? (-one) OR (-two) Ha, nope, that runs into a different problem (or is it the same problem?), and always returns 0 hits. It looks like the lucene query parser can't handle a pure-negative sub-query like that seperate by OR? Not sure why, can anyone explain that one? For that particular pattern, this crazy refactoring of the query does work and get the actual boolean logic result of (not 'one') OR (not 'two'): (*:* AND -one) OR (*:* AND -two) Phew, crazy stuff. So that's a weird solution to getting actual boolean logic behavior for that pattern, but in general, I'm kind of wanting a parser that will give actual boolean logic behavior. Maybe someday I can find time to write it in Java (not the quickest thing for me, not familiar with the code at all). Jonathan
Re: Does every Solr request-response require a running server?
On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. Solr does either, depending on the test. Most tests start only an embedded solr server w/ no web server, but others use an embedded jetty server so one can talk HTTP to it. JettySolrRunner is used for the latter. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Does every Solr request-response require a running server?
On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. Solr does either, depending on the test. Most tests start only an embedded solr server w/ no web server, What is confusing me is the solr server. Is it SolrCore? In what aspects is it a 'server'? In my understanding it's the core of the Solr Web application which makes up the servlets interface, i.e. it's under the servlets not on top of them. but others use an embedded jetty server so one can talk HTTP to it. JettySolrRunner is used for the latter. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Set operations on multiple queries with different qf parameters
Don't know of any other documentation. There might be some minimal page on the wiki somewhere, but I can never find it either although I have some memory of seeing it once, it didn't have anything that the blog post didn't. I think 'mm' _should_ work as a LocalParam in a nested query, I use it myself in code and it seems to work. But not too surprised that 'fq' doesn't (although I haven't verified that myself). If indeed it doesn't, here would be a hacky way to get the same semantics, although it won't use the filter cache for the fq. If this doesn't work: defType=luceneq= _query_:{!edismax qf='p,q,r' fq='field1:xyz'}abc def AND _query_:{!edismax mm=100% qf='q, r, s'}jlk Then this should, we can just put it in our top-level lucene query as an additional condition. defType=luceneq= (_query_:{!edismax qf='p,q,r'}abc def AND field1:xyz) AND _query_:{!edismax mm=100% qf='q, r, s'}jlk Yeah, this starts to get painful, agreed, with unclear performance implications. On 5/17/2011 10:44 PM, Nikhil Chhaochharia wrote: Thanks, this looks good. mm and fq don't seem to be working for a nested query, but I should be able to work around it. I was unable to find much documentation on the Wiki, API docs or in the Solr book - please let me know if you are aware of any other documentation for this feature apart from the mentioned blog post. Thanks, Nikhil - Original Message - From: Jonathan Rochkindrochk...@jhu.edu To: solr-user@lucene.apache.org; Nikhil Chhaochharianikhil...@yahoo.com Cc: Sent: Tuesday, 17 May 2011 8:52 PM Subject: Re: Set operations on multiple queries with different qf parameters One way to do it might be to use the Solr 'nested query' functionality. http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ Not entirely sure this will work exactly as I've written it, but give you some ideas of what nested query can do. Note not fully URL-encoded for clarity: defType=luceneq= _query_:{!edismax qf='p,q,r' fq='field1:xyz'}abc def AND _query_:{!edismax mm=100% qf='q, r, s'}jlk On 5/17/2011 2:55 AM, Nikhil Chhaochharia wrote: Hi, I am using Solr 3.1 with edismax. My frontend allows the user to create arbitrarily complex queries by modifying q, fq, qf and mm (only 1 and 100% are allowed) parameters. The queries can then be saved by the user. The user should be able to perform set operations on the saved searches. For example, the user may want to see all documents which are returned both by saved search 1 and saved search 2 (equivalent to intersection of the two). If the saved searches contain q, fq and/or mm, then I can combine the saved searches to create a new query which will be equivalent to their intersection. However, I can't figure out how to handle qf? For example, Query 1 = q=abc deffq=field1:xyzmm=1qf=p,q,r Query 2 = q=jklmm=100%qf=q,r,s How do I get the list of common documents which are present in the result set of both queries? Thanks, Nikhil
Re: Does every Solr request-response require a running server?
On Wed, May 18, 2011 at 11:14 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. Solr does either, depending on the test. Most tests start only an embedded solr server w/ no web server, What is confusing me is the solr server. Is it SolrCore? In what aspects is it a 'server'? In my understanding it's the core of the Solr Web application which makes up the servlets interface, i.e. it's under the servlets not on top of them. Look at TestHarness - it instantiates a CoreContainer. When running as a webapp in a Jetty server, a DispatchFilter is registered that instantiates the CoreContainer. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco but others use an embedded jetty server so one can talk HTTP to it. JettySolrRunner is used for the latter. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: [POLL] How do you (like to) do logging with Solr
On 5/17/2011 10:00 AM, Chris Hostetter wrote: : If I understand what you've said above correctly, removing the binding in : solr.war would make it inherit the binding in jetty/tomcat/whatever, is that : right? That sounds like an awesome plan to me. The example jetty server can : be configured instead of solr.war. Once you've answered this, I can submit my : vote. no, removing the bindings in solr.war would result in solr not logging *anything* unless you manually added a jar (defining the bindings you want) to the jetty (or tomcat) system classloader. What I'd want to have is the ability to download Solr source code, not modify anything, create a .war, and drop it into an existing system that has my preferred logging already set up, which from what you are saying would also require that the example have a jar with the JDK bindings, and that everyone who sets up a more custom system create their own jar and put it somewhere it can be found. What's involved in creating that jar? Is it something that a novice could get done? Is it something that could be prepackaged for the most common choices, or possibly already available on the Internet? Thanks, Shawn
JSON delete error with latest branch_3x
I updated to the latest branch_3x (r1124339) and I'm now getting the error below when trying a delete by query or id. Adding documents with the new format works as do the commit and optimize commands. Possible regression due to SOLR-2496? curl 'http://localhost:8988/solr/update/json?wt=json' -H 'Content-type:application/json' -d '{delete:{query:*:*}}' Error 400 meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Problem accessing /solr/update/json. Reason: meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false
Re: JSON delete error with latest branch_3x
On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote: I updated to the latest branch_3x (r1124339) and I'm now getting the error below when trying a delete by query or id. Adding documents with the new format works as do the commit and optimize commands. Possible regression due to SOLR-2496? curl 'http://localhost:8988/solr/update/json?wt=json' -H 'Content-type:application/json' -d '{delete:{query:*:*}}' Error 400 meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Problem accessing /solr/update/json. Reason: meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Hmmm, looks like unit tests must be inadequate for the JSON format. I'll look into it. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Anyone familiar with Solandra or Lucandra?
This will be possible once triggers are finished for cassandra, then we can hook into CF inserts and auto index in solandra. On Tue, May 17, 2011 at 5:10 PM, kenf_nc ken.fos...@realestate.com wrote: Ah. I see. That reduces its usefulness to me some. The multi-master aspect is still a big draw of course. But I was hoping this also added an integrated persistence layer to Solr as well. -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2954320.html Sent from the Solr - User mailing list archive at Nabble.com. -- http://twitter.com/tjake
Re: JSON delete error with latest branch_3x
OK, I just fixed this on branch_3x. Trunk is fine (it was an error in the 3x backport that wasn't caught because the test doesn't go through the complete solr stack to the update handkler). -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote: I updated to the latest branch_3x (r1124339) and I'm now getting the error below when trying a delete by query or id. Adding documents with the new format works as do the commit and optimize commands. Possible regression due to SOLR-2496? curl 'http://localhost:8988/solr/update/json?wt=json' -H 'Content-type:application/json' -d '{delete:{query:*:*}}' Error 400 meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Problem accessing /solr/update/json. Reason: meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Hmmm, looks like unit tests must be inadequate for the JSON format. I'll look into it. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: JSON delete error with latest branch_3x
Thanks Yonik, all my app's test cases now pass again. --Paul On Wed, May 18, 2011 at 2:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: OK, I just fixed this on branch_3x. Trunk is fine (it was an error in the 3x backport that wasn't caught because the test doesn't go through the complete solr stack to the update handkler). -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote: I updated to the latest branch_3x (r1124339) and I'm now getting the error below when trying a delete by query or id. Adding documents with the new format works as do the commit and optimize commands. Possible regression due to SOLR-2496? curl 'http://localhost:8988/solr/update/json?wt=json' -H 'Content-type:application/json' -d '{delete:{query:*:*}}' Error 400 meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Problem accessing /solr/update/json. Reason: meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Hmmm, looks like unit tests must be inadequate for the JSON format. I'll look into it. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Field collapsing on multiple fields and/or ranges?
bump -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
As far as I know this is not possible today with either Solr's 4.0 grouping impl or the new grouping module (soon to be grouping in Solr 3.x). I'm not sure about the patch on SOLR-236 though. But it's an interesting use case; it's a compound group key, right? You want to group by a tuple (X, Y). Can you open a Lucene issue for this? I'm not sure we can fix it today but I think the use case is reasonable so we can at least discuss it on an issue... Mike http://blog.mikemccandless.com On Wed, May 18, 2011 at 2:23 PM, arian487 akarb...@tagged.com wrote: bump -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
Thanks for the reply! How exactly do I open an issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Disable IDF scoring on certain fields
I believe I have applied the patch correctly. However, I cannot seem to figure out where the similarity class I create should reside. Any tips on that? Thanks, Brian Lamb On Tue, May 17, 2011 at 4:00 PM, Brian Lamb brian.l...@journalexperts.comwrote: Thank you Robert for pointing this out. This is not being used for autocomplete. I already have another core set up for that :-) The idea is like I outlined above. I just want a multivalued field that treats every term in the field the same so that the only way documents separate themselves is by an unrelated boost and/or matching on multiple terms in that field. On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.io wrote: Well, if you're experimental you can try trunk as Robert points out it has been fixed there. If not, i guess you're stuck with creating another core. If this fieldType specifically used for auto-completion? If so, another core, preferably on another machine, is in my opinion the way to go. Auto-completion is tough in terms of performance. Thanks Robert for pointing to the Jira ticket. Cheers Hi Markus, I was just looking at overriding DefaultSimilarity so your email was well timed. The problem I have with it is as you mentioned, it does not seem possible to do it on a field by field basis. Has anyone had any luck with doing some of the similarity functions on a field by field basis? I have need to do more than one of them and from what I can find, it seems that only computeNorm accounts for the name of the field. Thanks, Brian Lamb On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Although you can configure per field TF (by omitTermFreqAndPositions) you can't do this for IDF. If you index is only used for this specific purpose (seems like an auto-complete index) then you can override DefaultSimilarity and return a static value for IDF. If you still want IDF for other fields then i think you have a problem because Solr doesn't yet support per-field similarity. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav a/org/apache/lucene/search/DefaultSimilarity.java?view=markup Cheers, Hi all, I have a field defined in my schema.xml file as fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType field name=myfield multiValued=true type=edgengram indexed=true stored=true required=false omitNorms=true / I would like do disable IDF scoring on this field. I am not interested in how rare the term is, I only care if the term is present or not. The idea is that if a user does a search for myfield:dog OR myfield:pony, that any document containing dog or pony would be scored identically. In the case that both showed up, that record would be moved to the top but all the records where they both showed up would have the same score. So long story short, how can I disable the idf score for this particular field? Thanks, Brian Lamb
Re: Field collapsing on multiple fields and/or ranges?
Start here: https://issues.apache.org/jira/browse/LUCENE Create an account (it's free), open an issue and set the component to modules/grouping, fill in the fields, and submit it :) Then maybe make a patch and attach it! Genericizing the per-doc grouping key is important; we have an issue open for this already: https://issues.apache.org/jira/browse/LUCENE-3099 So in theory if we had LUCENE-3099 done, a sub-class could create a compound group key. Mike http://blog.mikemccandless.com On Wed, May 18, 2011 at 3:34 PM, arian487 akarb...@tagged.com wrote: Thanks for the reply! How exactly do I open an issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
https://issues.apache.org/jira/browse/SOLR-2526 modules/grouping was not a valid component so I just put it in search. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
Ahh, that's because you opened a Solr not a Lucene issue ;) The modules (incl. new grouping module) are under Lucene. That's fine, we can leave it as a Solr issue. Mike http://blog.mikemccandless.com On Wed, May 18, 2011 at 4:10 PM, arian487 akarb...@tagged.com wrote: https://issues.apache.org/jira/browse/SOLR-2526 modules/grouping was not a valid component so I just put it in search. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html Sent from the Solr - User mailing list archive at Nabble.com.
Storing, indexing and searching XML documents in Solr
Hi, I'm new to solr so apologies if the solution is already documented. I have installed and populated a solr index using the examples as a template with a version of the data below. I have XML in the form of entity resource guid123898-2092099098982/guid media_formatBlu-Ray/media_format updated2011-05-05T11:25:35+0500/updated /resource price currency=usd3.99price discounts discount type=percentage rate=30 start=2011-05-03T00:00:00 end=2011-05-10T00:00:00 / discount type=decimal amount=1.99 coupon=1 / . /discounts aspect_ratio16:9/aspect_ratio duration1620/duration categories category id=drama / category id=horror / /categories rating rate id=D1contains some scenes which some viewers may find upsetting/rate /rating ... media_typeVideo/media_type /entity Can I populate solr directly with this document (like I believe marklogic does )? If yes Can I search on any attribute ( i.e. find all records where /entity/resource/media_format equals blu-ray ) If no What is the best practice to import the attributes above into solr ( i.e. patterns for sub dividing / flattening document ). Does solr support attached documents and if so is this advised ( how does it affect performance ). Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. Thanks again OJ
Re: Replication Clarification Please
Alexander, sorry for the delay in replying. I wanted to test out a few hunches that I had before I get back to you. Hurray!!! I was able to resolve the issue. The problem was with the cache settings in the solrconfig.xml. It was taking almost 15-20 minutes to warm up the caches on each commit, as we are commit heavy (every 5 minutes) the replication was screaming for the new searcher to be warmed and it would never get a chance to finish so it was perennially backed up. We reduced the cache and autowarm counts and now the replication is happy finishing within 20 seconds!! Thank you again for all your support. Thanks, Ravi Kiran Bhaskar The Washington Post 1150 15th St. NW Washington, DC 20071 On Sun, May 15, 2011 at 3:12 AM, Alexander Kanarsky alexan...@trulia.com wrote: Ravi, what is the replication configuration on both master and slave? Also could you list of files in the index folder on master and slave before and after the replication? -Alexander On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote: Sorry guys spoke too soon I guess. The replication still remains very slow even after upgrading to 3.1 and setting the compression off. Now Iam totally clueless. I have tried everything that I know of to increase the speed of replication but failed. if anybody faced the same issue, can you please tell me how you solved it. Ravi Kiran Bhaskar On Thu, May 12, 2011 at 6:42 PM, Ravi Solr ravis...@gmail.com wrote: Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved from 1.4.1 to 3.1 and have made several changes to configuration. The configuration changes have worked nicely till now and the replication is finishing within the interval and not backing up. The changes we made are as follows 1. Increased the mergeFactor from 10 to 15 2. Increased ramBufferSizeMB to 1024 3. Changed lockType to single (previously it was simple) 4. Set maxCommitsToKeep to 1 in the deletionPolicy 5. Set maxPendingDeletes to 0 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios well over 75% to increase warming speed 7. Increased the poll interval to 6 minutes and re-indexed all content. Thanks, Ravi Kiran Bhaskar On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky alexan...@trulia.com wrote: Ravi, if you have what looks like a full replication each time even if the master generation is greater than slave, try to watch for the index on both master and slave the same time to see what files are getting replicated. You probably may need to adjust your merge factor, as Bill mentioned. -Alexander On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote: Hello Mr. Kanarsky, Thank you very much for the detailed explanation, probably the best explanation I found regarding replication. Just to be sure, I wanted to test solr 3.1 to see if it alleviates the problems...I dont think it helped. The master index version and generation are greater than the slave, still the slave replicates the entire index form master (see replication admin screen output below). Any idea why it would get the whole index everytime even in 3.1 or am I misinterpreting the output ? However I must admit that 3.1 finished the replication unlike 1.4.1 which would hang and be backed up for ever. Master http://masterurl:post/solr-admin/searchcore/replication Latest Index Version:null, Generation: null Replicatable Index Version:1296217097572, Generation: 12726 Poll Interval 00:03:00 Local Index Index Version: 1296217097569, Generation: 12725 Location: /data/solr/core/search-data/index Size: 944.32 MB Times Replicated Since Startup: 148 Previous Replication Done At: Tue May 10 12:32:42 EDT 2011 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011 Current Replication Status Start Time: Tue May 10 12:32:41 EDT 2011 Files Downloaded: 18 / 108 Downloaded: 317.48 KB / 436.24 MB [0.0%] Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%] Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s Thanks, Ravi Kiran Bhaskar On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky alexan...@trulia.com wrote: Ravi, as far as I remember, this is how the replication logic works (see SnapPuller class, fetchLatestIndex method): 1. Does the Slave get the whole index every time during replication or just the delta since the last replication happened ? It look at the index version AND the index generation. If both slave's version and generation are the same as on master, nothing gets replicated. if the master's generation is greater than on slave, the slave fetches the delta files only (even if the
Re: Storing, indexing and searching XML documents in Solr
On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: Storing, indexing and searching XML documents in Solr
The data is being imported directly from mysql. The document is however indeed a good starting place. Thanks 2011/5/18 Yury Kats yuryk...@yahoo.com On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: Storing, indexing and searching XML documents in Solr
Great document. I can see how to import the data direct from the database. However it seems as though I need to write xpath's in the config to extract the fields that I wish to transform into an solr document. So it seems that there is no way of storing the document structure in solr as is? 2011/5/18 Yury Kats yuryk...@yahoo.com On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: [POLL] How do you (like to) do logging with Solr
Hi, If you've setup your Tomcat with log4j logging, and want to add Solr, within the same logging config, you need to: #1. Remove slf4j-jdk14-1.6.1.jar from solr.war (unpack, remove, repack) #2. Download slf4j-log4j12-1.6.1.jar (from slf4j.org) and place it in e.g. tomcat/shared/lib If solr.war shipped without a pre-packaged binding, you could skip #1. The binding jar you deploy to appserver lib would also take effect for any other webapp using slf4j deployed to the same app-server. An alternative to manually repackage solr.war as in #1, is Hoss' suggestion in SOLR-2487 of a new ANT option to build Solr artifacts without the JUL binding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 18. mai 2011, at 18.33, Shawn Heisey wrote: On 5/17/2011 10:00 AM, Chris Hostetter wrote: : If I understand what you've said above correctly, removing the binding in : solr.war would make it inherit the binding in jetty/tomcat/whatever, is that : right? That sounds like an awesome plan to me. The example jetty server can : be configured instead of solr.war. Once you've answered this, I can submit my : vote. no, removing the bindings in solr.war would result in solr not logging *anything* unless you manually added a jar (defining the bindings you want) to the jetty (or tomcat) system classloader. What I'd want to have is the ability to download Solr source code, not modify anything, create a .war, and drop it into an existing system that has my preferred logging already set up, which from what you are saying would also require that the example have a jar with the JDK bindings, and that everyone who sets up a more custom system create their own jar and put it somewhere it can be found. What's involved in creating that jar? Is it something that a novice could get done? Is it something that could be prepackaged for the most common choices, or possibly already available on the Internet? Thanks, Shawn
Using Boost fields for a sum total score.
I have a sizable index with a main content field, and 5 defined boost fields (boost_low, boost_med, boost_high, boost_max, and boost_neg). The idea and hope was to allow searches on the content field to be influenced/boosted by the boosting fields if the search term was present. I had set up a dismax query with a qf' setting that boosted the content field significantly, and the 5 boost fields with descending values. (e.g. content^5.0 boost_max^1.2 boost_high^1.0 etc...) After some testing and reading, I'm of the understanding that this setup will result search the fields (content and boost fields), and apply the boost to each, then choose the field with the highest score as the score for that result (essentially taking the MAX() score from the various fields, and not the SUM() of the fields' scores.) If this is the case, is there an alternate setup, config item, or means of combining these scores to return a SUM() score instead? Any direction or help would be most appreciated. Ron -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2958968.html Sent from the Solr - User mailing list archive at Nabble.com.
Two XPathEntityProcessor questions
Hi, Can anyone tell me if the XPathEntityProcessor handles expresions like this: xpath=/a/b[c='value']/d/e That is, return a node that has a predecessor with a given text value? I would like to map various XPath expressions of that form to the same document in the index (I have a unique key constraint). Also, is it possible to assign a value to a unique key from an HTTP parameter? Something like this: field column=id${dataimporter.request.id}/field I'm using a ContentStreamDataSource to fetch data from a POST. Thanks, Jeff
Re: [POLL] How do you (like to) do logging with Solr
I usually build solr using 'ant test dist' to run tests and build the .war and other jars, in particular the dataimporthandler. Having an alternate ant option to build without the binding would work for me. If/when I get around to changing logging mechanisms, I wouldn't be able to use the binary distribution, but with 3.1 I am already including selected patches from branch_3x and building it myself. I can see that there is a lot of resistance to just removing the binding entirely. I think that's a better option, but I know it's important to take care of the complete novices and their initial experience with the software. [ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [X] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [X] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! On 5/18/2011 3:31 PM, Jan Høydahl wrote: Hi, If you've setup your Tomcat with log4j logging, and want to add Solr, within the same logging config, you need to: #1. Remove slf4j-jdk14-1.6.1.jar from solr.war (unpack, remove, repack) #2. Download slf4j-log4j12-1.6.1.jar (from slf4j.org) and place it in e.g. tomcat/shared/lib If solr.war shipped without a pre-packaged binding, you could skip #1. The binding jar you deploy to appserver lib would also take effect for any other webapp using slf4j deployed to the same app-server. An alternative to manually repackage solr.war as in #1, is Hoss' suggestion in SOLR-2487 of a new ANT option to build Solr artifacts without the JUL binding.
Re: [POLL] How do you (like to) do logging with Solr
: An alternative to manually repackage solr.war as in #1, is Hoss' : suggestion in SOLR-2487 of a new ANT option to build Solr artifacts : without the JUL binding. More specificly, i'm advocating a new ANT property that would let you specify (by path) whatever SLF4J binding jar you want to include, or that you don't want any SLF4J binding jar included (by specifying a path to a jar that doesn't exist) I want the default... ant dist I don't want a binding in solr.war... ant -Dslf4j.jar.path=BOGUS_FILE_PATH dist I want a specific binding in solr.war... ant -Dslf4j.jar.path=/my/lib/slf4j-jcl-*.jar dist -Hoss
Re: Storing, indexing and searching XML documents in Solr
You're right, you can't store an XML document directly in Solr. You have to pull it apart and index it such that you can get whatever information back you need. How you flatten data depends entirely upon your needs. The high-level idea is that you want to create fields such that text searches work. The moment you start thinking about how can I express a relationship in the query, back up and try to flatten the data so you can just *search*. This is vague, I know. But so much depends on how you want to use the data that specifics are hard to give. You've gotta take off your DB hat and not worry about duplicating data. De-normalize lots and lots and lots first... Best Erick On Wed, May 18, 2011 at 5:27 PM, Judioo cont...@judioo.com wrote: Great document. I can see how to import the data direct from the database. However it seems as though I need to write xpath's in the config to extract the fields that I wish to transform into an solr document. So it seems that there is no way of storing the document structure in solr as is? 2011/5/18 Yury Kats yuryk...@yahoo.com On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: Using Boost fields for a sum total score.
You might look at edismax on the 3.1 and trunk, it calculates scores a bit differently. You could always just form the query yourself in the app and not use dismax I think. Best Erick On Wed, May 18, 2011 at 6:06 PM, ronveenstra ron-s...@agathongroup.com wrote: I have a sizable index with a main content field, and 5 defined boost fields (boost_low, boost_med, boost_high, boost_max, and boost_neg). The idea and hope was to allow searches on the content field to be influenced/boosted by the boosting fields if the search term was present. I had set up a dismax query with a qf' setting that boosted the content field significantly, and the 5 boost fields with descending values. (e.g. content^5.0 boost_max^1.2 boost_high^1.0 etc...) After some testing and reading, I'm of the understanding that this setup will result search the fields (content and boost fields), and apply the boost to each, then choose the field with the highest score as the score for that result (essentially taking the MAX() score from the various fields, and not the SUM() of the fields' scores.) If this is the case, is there an alternate setup, config item, or means of combining these scores to return a SUM() score instead? Any direction or help would be most appreciated. Ron -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2958968.html Sent from the Solr - User mailing list archive at Nabble.com.
Fuzzy search and solr 4.0
Hi, I want to do a fuzzy search that compare a phrase to a field in solr. For example: abc company ltda will be compared to abc comp, abc corporation, def company ltda, nothing to match here. The thing is the it has to always returns documents sorted by its score. I've found some good algorithms to do that, like StrikeAMatch[1] and JaroWinkler. Using the JaroWinkler with strdist() I can do exactly that. But, I rather prefer to use the StrikeAMatch that had a patch in the lucene jira that was never commited. So, I contacted the author of that patch and he told me that I should use the solr 4.0 that it has now some pretty good new fuzzy search enhancements that made StrikeAMatch seems toys for kids. Anyone know how can I achieve that using solr 4.0? [1] http://www.catalysoft.com/articles/StrikeAMatch.html
RE: Solr Range Facets
: Thanks for explaining the point system, please find below the complete Sorry .. that part was ment to be a joke, I think i was really tired when i wrote that. The key take away: details matter. : int : name=2011-05-02T05:30:00Z4/int : int : name=2011-05-03T05:30:00Z63/int : int : name=2011-05-04T05:30:00Z0/int : int : name=2011-05-05T05:30:00Z0/int ... : Now if you notice that the response show 4 records for the 2th of May 2011 : which will fall in the IST timezone (+330MINUTES), but when I try to get the right. : results I see that there is only 1 result for the 5th why is this happening. Why do you say that? According to those facet results, there are 0 docs between 2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i assume you mean by the 5th ... ie: May 5th, in that timezone offset) Not only that, but the query you posted isn't attempting to filter on the 5th by any possible definition of the concept... : str : name=fqcreatedOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *] : /str ...that's saying you want all docs with a date on or after the 1st. : If I don't apply the offset the results match with the facet count, is there : something wrong in my query? it looks like your query is just plain wrong. if you're goal was to drill down and show only documents from the 5th it should have been something like... fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 2011-05-05T00:00:00Z+330MINUTES+1DAY] ...but note also that there is the question of edge inclusion and when you want to use [A TO B] vs [A TO B}. The facet.range.include option is how you control wether the edges are used in the facet counts... http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include -Hoss
Re: Field collapsing on multiple fields and/or ranges?
Ah, my mistake. Thanks alot, this would be a really cool feature :) For now I'm resorting to like making more then one query and cross referencing the two separate queries. -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2959439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: lucene parser, negative OR operands
: Thanks Yonik. I recall hearing about this before, but was vague on the : details, thanks for supplying some and refreshing my memory. matching in Lucene is addative ... queries must match *something*, a clause ofa boolean query can be the negation of a query, but that only defines how documents should be removed from the set matched by the other queries in that boolean. To put it another way: imagine modeling the list of documents matching a query as a bitset. you can set bits to true, and you can set bits to false, but the bitset starts out with all bits as false, so if all you do is set bits to false, your bitset will *end* will all bits as false : If I want to understand more about how the lucene query parser does it's : thing, can anyone suggest the source files I should be looking at? the QueryParser.jj is the grammer for parsing, but the crux is to understand that the BooleanQuery class supports three types of clauses: PROHIBITED, MANDATORY, and OPTIONAL. The QueryParser implements those as -, + and the default beahvior when neither +/- is present. The QueryParser also jumps through some hoops to support AND, OR, NOT but not all permutations of those are viable : If I really do want actual boolean logic behavior, what are my options? I : guess one is trying to write my own query parser. boolean logic generally is defined in some form relative the universe .. so a pure negative query like -red really means all things IN THE UNIVERSE that are not 'red' ... you can express that using *:* -red What solr does (and how this thread started) is pointing out that for top level queries, (like q=-red or fq=-red) solr adds the *:* to the boolean query for you. : Hmm, for that particular query, what about using parens to force a sub-query? : : (-one) OR (-two) : : Ha, nope, that runs into a different problem (or is it the same problem?), and : always returns 0 hits. It looks like the lucene query parser can't handle a : pure-negative sub-query like that seperate by OR? Not sure why, can anyone : explain that one? the query parser can handle it, and it produces a valid query object, but that query object doesn't match anything. -one matches nothing, -two matchines nothing ... nothing union nothing is still nothing. : For that particular pattern, this crazy refactoring of the query does work and : get the actual boolean logic result of (not 'one') OR (not 'two'): : : (*:* AND -one) OR (*:* AND -two) correct -- that is you formally saying give me all docs IN THE UNIVERSE that are not 'one', and union that with all docs IN THE UNIVERSE that are not 'two' : behavior for that pattern, but in general, I'm kind of wanting a parser that : will give actual boolean logic behavior. Maybe someday I can find time to : write it in Java (not the quickest thing for me, not familiar with the code at : all). You could implement a parser like that relatively easily -- just make sure you put a MatchAllDocsQuery in every BooleanQuery object thta you construct, and only ever use the PROHIBITED and MANDATORY clause types (never OPTIONAL) ... the thing is, a parser like that isn't as useful as you think it might be when dealing with search results. OPTIONAL clauses are where most of the useful factors of scoring documents ocme into play. -Hoss
Too slow indexing while using 2 different data sources
Is it normal to observe slow speed while using an URL datasource and also a DB? it was something around 30 seconds with only DB source, but when I add URL datasource too, then it takes 24 - 25 mins to index exactly the same amount of docs Is there anyway of overcoming this thing? or i have to suffer? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2959551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [POLL] How do you (like to) do logging with Solr
[ ] I always use the JDK logging as bundled in solr.war, that's perfect [X ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [x ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! 2011/5/19 Chris Hostetter hossman_luc...@fucit.org : An alternative to manually repackage solr.war as in #1, is Hoss' : suggestion in SOLR-2487 of a new ANT option to build Solr artifacts : without the JUL binding. More specificly, i'm advocating a new ANT property that would let you specify (by path) whatever SLF4J binding jar you want to include, or that you don't want any SLF4J binding jar included (by specifying a path to a jar that doesn't exist) I want the default... ant dist I don't want a binding in solr.war... ant -Dslf4j.jar.path=BOGUS_FILE_PATH dist I want a specific binding in solr.war... ant -Dslf4j.jar.path=/my/lib/slf4j-jcl-*.jar dist -Hoss
Re: Too slow indexing while using 2 different data sources
Some details? well i think its clear but still here is the part of my solrconfig requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdbconfig.xml/str lst name=datasource str name=namedatabase/str str name=typeJdbcDataSource/str str name=drivercom.mysql.jdbc.Driver/str str name=urljdbc:mysql://abcd/efgh/str str name=usersome/str str name=passwordsome/str /lst lst name=datasource str name=nameurl_data/str str name=typeURLDataSource/str str name=processorXPathEntityProcessor/str /lst /lst /requestHandler and my dbconfig /* Fields from DB */ /* Fields from DB */ /* Fields from DB */ /* Fields from DB */ ... ... .. entity name=universal dataSource=url_data url=http://..com/fddgtr.php/${sa.somevalue}; processor=XPathEntityProcessor forEach=/some/somefield field column=info xpath=/some/somefield/info/ /entity - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2959626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too slow indexing while using 2 different data sources
On Thu, May 19, 2011 at 6:59 AM, deniz denizdurmu...@gmail.com wrote: Is it normal to observe slow speed while using an URL datasource and also a DB? it was something around 30 seconds with only DB source, but when I add URL datasource too, then it takes 24 - 25 mins to index exactly the same amount of docs [...] What is the time for indexing just the URL data source? Is it possible that your URL data source is slow in serving data? Regards, Gora
Re: filter cache and negative filter query
: What I don't like is that it systematically uses the positive version. : Sometimes the negative version will give many less results (for example, : in some cases I filter by documents not having a given field, and there : are very few of them). I think it would be much better that solr the positive version of the filter is the only one that can be executed, so it's the one that gets cached today, but the principle you are describing is still sound -- in fact I'm pretty sure there is a note in the code about this exact idea as a possible performance enhancment: if the cardinality of a filter is very large (regardless of wether the query was positive or negative) it's negative relative the set of all docs could be cached in it's place to save space... ...but... ...the complication would comes later when doing lookups -- for cache lookups to work with an arbitrary query, you would either need to changed the cache structure from Query=DocSet to a mapping of Query=[DocSet,inverseionBit] and store the same cache value needs needs with two keys -- both the positive and the negative; or you keep the current cache structure, store whichever Query=DocSet pair has the smallest cardinality, but then every logical cache lookup requires a second actual cache lookup under the covers (for the negation of the query) if the first one doesn't match anything. it would require some benchmarking and hard decisions about whether the (hypothetical) memory savings are worth the (hypothetical) CPU cost. : query that in fact returns the negative results. As a simple example, : I believe that, for a boolean field, -field:true is exactly the same as : +field:false, but the former is a negative query and the latter is a that's not strictly true in all cases... * if the field is multivalued=true, a doc may contain both false and true in field, in which case it would match +field:false but it would not match -field:true * if the field is not multivalued-false, and required=false, a doc may not contain any value, in which case it would match -field:true but it would not match +field:false -Hoss
Re: K-Stemmer for Solr 3.1
I see KStem being mentioned lately. It's been 5+ years since I looked at the original KStem stuff, but I recall there being a license issue with the *original* KStem. I think it was under some flavour of GPL and that was the reason why we didn't include it in Lucene/Solr back then. I say this now because I saw people said KStem was released under BSD license, which doesn't match what I saw 5+ years ago. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Smiley, David W. dsmi...@mitre.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 16, 2011 5:33:00 PM Subject: Re: K-Stemmer for Solr 3.1 Lucid's KStemmer is LGPL and the Solr committers have shown that they don't want LGPL libraries shipping with Solr. If you are intent on releasing your changes, I suggest attaching both the modified source and the compiled jar onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed. ~ David Smiley On May 16, 2011, at 2:24 AM, Bernd Fehling wrote: I don't know if it is allowed to modify Lucid code and add it to jira. If someone from Lucid would give me the permission and the Solr developers have nothing against it I won't mind adding the Lucid KStemmer to jira for Solr 3.x and 4.x. There are several Lucid KStemmer users which I can see from the many requests which I got. Also the Lucid KStemmer is faster than the standard KStemmer. Bernd Am 16.05.2011 06:33, schrieb Bill Bell: Did you upload the code to Jira? On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de wrote: I backported a Lucid KStemmer version from solr 4.0 which I found somewhere. Just changed from import org.apache.lucene.analysis.util.CharArraySet; // solr4.0 to import org.apache.lucene.analysis.CharArraySet; // solr3.1 Bernd Am 12.05.2011 16:32, schrieb Mark: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd -- * Bernd Fehling Universitätsbibliothek Bielefeld Dipl.-Inform. (FH) Universitätsstr. 25 Tel. +49 521 106-4060Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de 33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: K-Stemmer for Solr 3.1
Hm, maybe I was wrong. I don't see any mention of *GPL on KStem download page. I only see http://ciir.cs.umass.edu/downloads/agreements/general.html. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, May 18, 2011 11:35:32 PM Subject: Re: K-Stemmer for Solr 3.1 I see KStem being mentioned lately. It's been 5+ years since I looked at the original KStem stuff, but I recall there being a license issue with the *original* KStem. I think it was under some flavour of GPL and that was the reason why we didn't include it in Lucene/Solr back then. I say this now because I saw people said KStem was released under BSD license, which doesn't match what I saw 5+ years ago. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Smiley, David W. dsmi...@mitre.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 16, 2011 5:33:00 PM Subject: Re: K-Stemmer for Solr 3.1 Lucid's KStemmer is LGPL and the Solr committers have shown that they don't want LGPL libraries shipping with Solr. If you are intent on releasing your changes, I suggest attaching both the modified source and the compiled jar onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed. ~ David Smiley On May 16, 2011, at 2:24 AM, Bernd Fehling wrote: I don't know if it is allowed to modify Lucid code and add it to jira. If someone from Lucid would give me the permission and the Solr developers have nothing against it I won't mind adding the Lucid KStemmer to jira for Solr 3.x and 4.x. There are several Lucid KStemmer users which I can see from the many requests which I got. Also the Lucid KStemmer is faster than the standard KStemmer. Bernd Am 16.05.2011 06:33, schrieb Bill Bell: Did you upload the code to Jira? On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de wrote: I backported a Lucid KStemmer version from solr 4.0 which I found somewhere. Just changed from import org.apache.lucene.analysis.util.CharArraySet; // solr4.0 to import org.apache.lucene.analysis.CharArraySet; // solr3.1 Bernd Am 12.05.2011 16:32, schrieb Mark: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd -- * Bernd Fehling Universitätsbibliothek Bielefeld Dipl.-Inform. (FH) Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de 33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: indexing directed graph
Maybe Gora was referring to Siren: http://search-lucene.com/?q=siren+-sami Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dani.b.angelov dani.b.ange...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, May 17, 2011 2:44:55 AM Subject: Re: indexing directed graph Gora, thank you for your reply! Could you point me a link regarding There was a discussion earlier on this topic -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2951418.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Range Facets
Hi Chris, I made a mistake in explaining the second part of my question. If you notice the faceted result, you will notice for results for the 2nd May 2011 there are 4 results, but when I query for the 2nd May I should get only 1 result since after apply the offset all the remaining results should be shifted to the 3rd of May. But I think i got the reason for this, I guess offset is applied to only the edges and not to the actual result. I mean when we say facet with an offset of +330MINUTES, what solr actually does is just move the facets by +330MINUTES, but not each and every document. Regards, Rohit From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Sent: Thu, 19 May, 2011 6:16:53 AM Subject: RE: Solr Range Facets : Thanks for explaining the point system, please find below the complete Sorry .. that part was ment to be a joke, I think i was really tired when i wrote that. The key take away: details matter. : int : name=2011-05-02T05:30:00Z4/int : int : name=2011-05-03T05:30:00Z63/int : int : name=2011-05-04T05:30:00Z0/int : int : name=2011-05-05T05:30:00Z0/int ... : Now if you notice that the response show 4 records for the 2th of May 2011 : which will fall in the IST timezone (+330MINUTES), but when I try to get the right. : results I see that there is only 1 result for the 5th why is this happening. Why do you say that? According to those facet results, there are 0 docs between 2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i assume you mean by the 5th ... ie: May 5th, in that timezone offset) Not only that, but the query you posted isn't attempting to filter on the 5th by any possible definition of the concept... : str : name=fqcreatedOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *] : /str ...that's saying you want all docs with a date on or after the 1st. : If I don't apply the offset the results match with the facet count, is there : something wrong in my query? it looks like your query is just plain wrong. if you're goal was to drill down and show only documents from the 5th it should have been something like... fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 2011-05-05T00:00:00Z+330MINUTES+1DAY] ...but note also that there is the question of edge inclusion and when you want to use [A TO B] vs [A TO B}. The facet.range.include option is how you control wether the edges are used in the facet counts... http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include -Hoss
Re: indexing directed graph
On Thu, May 19, 2011 at 9:12 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Maybe Gora was referring to Siren: http://search-lucene.com/?q=siren+-sami [...] That does look interesting, but is not what I was referring to. I seem to remember a discussion on this list some 3-4 months about someone wanting to make a customised Lucene index, specifically for graphs. I believe that he even wrote up a Wiki (?) page on it. Sorry, Dani, I have been busy, and so far my Google-fu has been unable to turn up the thread, or the Wiki page. Will let you know if I come across it. Regards, Gora
Re: K-Stemmer for Solr 3.1
Hi Otis, conclusion, if we check that the license agreement is included in all source files and as a seperate license file then we are clear about KStem itself. What about the modifications from Lucid, do you know if they publish under GPL? Bernd - BASE - Bielefeld Academic Search Engine http://www.base-search.net/ Am 19.05.2011 05:39, schrieb Otis Gospodnetic: Hm, maybe I was wrong. I don't see any mention of *GPL on KStem download page. I only see http://ciir.cs.umass.edu/downloads/agreements/general.html. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Otis Gospodneticotis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, May 18, 2011 11:35:32 PM Subject: Re: K-Stemmer for Solr 3.1 I see KStem being mentioned lately. It's been 5+ years since I looked at the original KStem stuff, but I recall there being a license issue with the *original* KStem. I think it was under some flavour of GPL and that was the reason why we didn't include it in Lucene/Solr back then. I say this now because I saw people said KStem was released under BSD license, which doesn't match what I saw 5+ years ago. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Smiley, David W.dsmi...@mitre.org To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org Sent: Mon, May 16, 2011 5:33:00 PM Subject: Re: K-Stemmer for Solr 3.1 Lucid's KStemmer is LGPL and the Solr committers have shown that they don't want LGPL libraries shipping with Solr. If you are intent on releasing your changes, I suggest attaching both the modified source and the compiled jar onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed. ~ David Smiley On May 16, 2011, at 2:24 AM, Bernd Fehling wrote: I don't know if it is allowed to modify Lucid code and add it to jira. If someone from Lucid would give me the permission and the Solr developers have nothing against it I won't mind adding the Lucid KStemmer to jira for Solr 3.x and 4.x. There are several Lucid KStemmer users which I can see from the many requests which I got. Also the Lucid KStemmer is faster than the standard KStemmer. Bernd Am 16.05.2011 06:33, schrieb Bill Bell: Did you upload the code to Jira? On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de wrote: I backported a Lucid KStemmer version from solr 4.0 which I found somewhere. Just changed from import org.apache.lucene.analysis.util.CharArraySet; // solr4.0 to import org.apache.lucene.analysis.CharArraySet; // solr3.1 Bernd Am 12.05.2011 16:32, schrieb Mark: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd -- * Bernd Fehling Universitätsbibliothek Bielefeld Dipl.-Inform. (FH) Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de 33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net * -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *