Solr not returning all documents?
Hi, As part of our application I have written a reindex task that runs through all documents in a core one by one (using *:*, a start offset and a row limit of 1) and adds them to a new core (potentially with a new schema). However, while working well for small sets this approach somehow does not seem to work for larger data sets. The Reindex task counts its offset into the old core, this count stops at about 118000 and no more documents are returned. However, numDocs says there are around 582000 documents in the old core. Am I making a wrong assumption in believing I should get all documents like this? Thanks, Adrian
RE: Perfect Match
Awesome Ahmet. Thanks for the reply. It seems to work now. Thanks a ton. From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tue 3/23/2010 2:35 PM To: solr-user@lucene.apache.org Subject: RE: Perfect Match Thankyou Ahmet. You were right. artist_s:Dora is bringing results. But I need artist_s:Dora the explorer to bring only those results which contain Dora the explorer. I tried to give artist_s:Dora the explorer (phrase search).. that is working. But artist_s:Dora the explorer is not working. Any way to make this artist_s:Dora the explorer to return results that contain this in them. I learned this from Chris Hostetter's message[1] You can use q={!field f=artist_s}Dora the explorer instead of q=artist_s:Dora the explorer. [1]http://search-lucene.com/m/rrHVV1ZhO4j/this+is+what+the+%22field%22+QParserPlugin+was+invented+for
field QParserPlugin - Help needed
Hello Experts, Could anyone please help me by directing me to some link where I can get more details on Solr's field QParserPlugin. I would be really grateful. Thankyou all, Manas
Experiences with SOLR-1797 ?
Hello, has anyone some experiences with this patch of SOLR-1797 (http://issues.apache.org/jira/browse/SOLR-1797) ? Best Regards Daniel Nowak Senior Developer Rocket Internet GmbH | Saarbrücker Straße 20/21 | 10405 Berlin | Deutschland tel: +49 30 / 559 554 66 | fax: +49 30 / 559 554 67 | skype: daniel.s.nowak mail: daniel.no...@rocket-internet.de Geschäftsführer: Frank Biedka, Dr. Florian Heinemann, Uwe Horstmann, Felix Jahn, Arnt Jeschke, Dr. Philipp Kreibohm Eingetragen beim Amtsgericht Berlin, HRB 109262 USt-ID DE256469659
Re: How to use Payloads with Solr?
On Mar 27, 2010, at 5:31 AM, MitchK wrote: Hello community, since I have searched for a solution to get TermPositions in Solr, I became more aware of the payload-features. So I decided to learn more about payloads. In the wiki, there is not much said about them, so I will ask here at the mailing-list. It seems like Payloads are some extra-information for tokens, which I can customize in any way. For example, I could write a payloadFilter that gives the highest scoring-factor to the first token and the lowest to the last one. I also could say oh, this word is a substantive. Add this as a payload-information: substantive. However: How do I use these information at query-time? How can I influence the scoring in Solr? I mean, I could write a payload-interpreter (Am I right to do so with AveragePayloadFunction from Lucene 2.9.1?) for scoring. So, if I do so, I can switch the scoring of all substantives without reindexing the payloads by setting there scoring-factor in the schema.xml (of course this will need some more extra-modifications). Unfortunately, there is no query time support for this, other than a custom query parser that is posted in JIRA by Erik Hatcher. Can anybody tell me more about how to use payloads with Solr? For all the others, who want to learn some basic-information about payloads, I would suggest to read this article from Grant Ingersoll: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ It is a really good tutorial and introduction to this topic. Unfortunately, it seems like he has not written anything about how to integrate this in Solr (I haven't find anything more). Yeah, this is is unfortunate. Would be nice to have both support for payloads and spans in Solr. -Grant
Getting solr response in HTML format : HTMLResponseWriter
Hello everybody I m using NUTCH with SOLR and the result of solr searching as you know is in XML format . Because I want an HTML format for the response (like the result of NUTCH searching result) so I have tried to attach the xslt steelsheet to the response of SOLR with passing this 2 variables wt=xslttr=example.xsl while example.xsl is an included steelsheet to SOLR , but the response in HTML was'nt very perfect . So i have readen on the net that we can write an extension to the QueryResponseWriter class like XMLResponseWriter (default) and i m trying to build that . I m proceeding like XMLREsponseWriter to create HTMLResponseWriter and i have added this line queryResponseWriter name=html class=org.apache.solr. request.HTMLResponseWriter / in solr-config.xml I have an error like this : org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.request.HTMLResponseWriter' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1408) at org.apache.solr.core.SolrCore.init(SolrCore.java:547) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.ClassNotFoundException: org.apache.solr.request.HTMLResponseWriter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) ... 33 more It appears that the compiler doesn't found the class HTMLResponseWriter Does anyone know where additionnals information about the class HTMLResponseWriter must be added to remove this error thanks for all
Re: Getting solr response in HTML format : HTMLResponseWriter
2010/3/29 Arnaud Garcia arnaud1...@gmail.com Hello everybody I m using NUTCH with SOLR and the result of solr searching as you know is in XML format . Because I want an HTML format for the response (like the result of NUTCH searching result) so I have tried to attach the xslt steelsheet to the response of SOLR with passing this 2 variables wt=xslttr=example.xsl while example.xsl is an included steelsheet to SOLR , but the response in HTML was'nt very perfect . So i have readen on the net that we can write an extension to the QueryResponseWriter class like XMLResponseWriter (default) and i m trying to build that . I m proceeding like XMLREsponseWriter to create HTMLResponseWriter and i have added this line queryResponseWriter name=html class=org.apache.solr. request.HTMLResponseWriter / in solr-config.xml I have an error like this : org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.request.HTMLResponseWriter' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1408) at org.apache.solr.core.SolrCore.init(SolrCore.java:547) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.ClassNotFoundException: org.apache.solr.request.HTMLResponseWriter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) ... 33 more It appears that the compiler doesn't found the class HTMLResponseWriter Does anyone know where additionnals information about the class HTMLResponseWriter must be added to remove this error thanks for all
Delete id from a specific core
Hey All From the docs deleting from an index os pretty simpl: java -Ddata=args -Dcommit=no -jar post.jar deleteidSP2514N/id/delete How about from a specific core? Say I wanted to delete id=12344 from core 1 Hope this makes sense and is easy to answer! Regards Lee
RE: One item, multiple fields, and range queries
Sorry, I intended to design my post so that one wouldn't have to read the thread for context but it seems I failed to do that. Don't bother reading the thread. The use-case I'm pondering modifying Lucene/Solr to solve is the one-to-many problem. Imagine a document that contains multiple addresses where each field of an address (like street, state, zipcode) go in different multi-valued fields. The main difficulty is considering how Lucene might be modified to have query results across different fields be intersected by a matching term position offset (which is designed in these fields to refer to a known value offset). Following the link you gave is interesting though the general case I'm talking about doesn't have a hierarchy. And I find the use of a single multi-valued field unpalatable for a variety of reasons. ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p683361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete id from a specific core
Lee - Use the url parameter. ~/dev/solr/example/exampledocs: java -jar post.jar -help SimplePostTool: version 1.2 This is a simple command line tool for POSTing raw XML to a Solr port. XML data can be read from files specified as commandline args; as raw commandline arg strings; or via STDIN. Examples: java -Ddata=files -jar post.jar *.xml java -Ddata=args -jar post.jar 'deleteid42/id/delete' java -Ddata=stdin -jar post.jar hd.xml Other options controlled by System Properties include the Solr URL to POST to, and whether a commit should be executed. These are the defaults for all System Properties... -Ddata=files -Durl=http://localhost:8983/solr/update -Dcommit=yes Core 1's update URL is likely something like http://localhost:8983/solr/1/update Erik On Mar 29, 2010, at 9:08 AM, Lee Smith wrote: Hey All From the docs deleting from an index os pretty simpl: java - Ddata=args -Dcommit=no -jar post.jar deleteidSP2514N/id/ delete How about from a specific core? Say I wanted to delete id=12344 from core 1 Hope this makes sense and is easy to answer! Regards Lee
Re: One item, multiple fields, and range queries
On 29.03.2010, at 15:11, David Smiley (@MITRE.org) wrote: Sorry, I intended to design my post so that one wouldn't have to read the thread for context but it seems I failed to do that. Don't bother reading the thread. The use-case I'm pondering modifying Lucene/Solr to solve is the one-to-many problem. Imagine a document that contains multiple addresses where each field of an address (like street, state, zipcode) go in different multi-valued fields. The main difficulty is considering how Lucene might be modified to have query results across different fields be intersected by a matching term position offset (which is designed in these fields to refer to a known value offset). i posted another use case the other day as well .. then again i hope the spatial support in 1.5 will make this use case obsolete soon. basically we have an app where we have offers that can be available in multiple stores. now in order to have a speedy compact index the idea was to simply store the geo location of the stores along with the offers in a multi valued field. however in order to filter on the x-y geo coordinates we would have to filter on the pairs. this is i guess similar to your above example as well with multiple addresses. here is the link to my post: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201003.mbox/%3cfb3f49c8-31d9-48fc-b416-73a1bbd3f...@pooteeweet.org%3e btw: i was mailed offlist if i have found an answer to the above question. so its not some crazy use case .. regards, Lukas Kahwe Smith m...@pooteeweet.org
More like this - setting a minimum number of terms used to build queries
Hey, Is there a way to make the more like this feature build its queries from a minimum number of interesting terms ? It looks like this component fires query with only 1 term in them. I got a lot of results that aren't similar at all with the parsed document fields. My parameters : mlt.fl=question,mlt.mintf=1mlt.mindf=mlt.minwl=4 The question field contains between 15 and 50 terms. Xavier S.
Re: RejectedExecutionException when searching with DirectSolrConnection
A followup: I discovered something interesting. If I don't run Jetty in the same JVM as DirectSolrConnection, all is well. Nrr.
solr-trunk in production?
Hi, I need the patch SOLR-236https://issues.apache.org/jira/browse/SOLR-236 (field collapsing) in a production-system which currently is running on Solr 1.4. Can I switch to the trunk version (and apply the patch) without problems or is this not recommended? Matthias
Re: ReplicationHandler reports incorrect replication failures
Thanks. I created https://issues.apache.org/jira/browse/SOLR-1853 2010/3/27 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: please create a bug
Drill down a solr result set by facets
Hi, I'm trying to perform a search based on keywords and then reduce the result set based on facets that user selects. First query for a search would look like this. http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH In the above query (as per dismax on the solr config file) it searches multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc... Then if user select 'Chemistry' from the facet field 'fDepartmentName' and 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the result set above to only records from where fDepartmentName is 'Chemistry' and 'fSponsor' is 'US Cancer/Diabetic Research Institute' The following query is not working. select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic Research Instituteversion=2.2 Fields starting with 'f' are defined in the schema.xml as copy fields. field name=DepartmentName type=text indexed=true stored=true multiValued=true / field name=fDepartmentName type=string indexed=true stored=false multiValued=true / copyField source=DepartmentName dest=fDepartmentName/ Any ideas on the correct syntax? Thanks, Dhanushka.
RE: One item, multiple fields, and range queries
David, The standard one-to-many solution is indexing each address (the many) as its own document, and then either copy the other fields from your current schema to these documents, or index using a heterogeneous field schema, grouping the different doc type instances with a unique key (the one) to form a composite doc. (These solutions address your discomfort with a single address field.) Also, while you say that you don't have a hierarchy, I think you do; what you have described could be expressed in XML as: doc field1.../field1 ... addresses address id=1 street.../street city.../city state.../state zip.../zip /address address id=2 street.../street city.../city state.../state zip.../zip /address ... /addresses /doc I believe you could use the scheme I described on the other thread, using a single address field, if you encoded it like so: _ADDRESS_ _STREET_ 12 Main Street _CITY_ Metripilos _STATE_ MZ _ZIP_ 0 _ADDRESS_ _STREET_ 512 23rd Avenue _CITY_ Carmtwon _STATE_ XB _ZIP_ 1 ... Then to find the docs associated with Carmtwon, XB: SpanNot Include SpanOr SpanNear slop=2147483647 inOrder=true SpanTerm_CITY_/SpanTerm SpanTermCarmtwon/SpanTerm SpanTerm_STATE_/SpanTerm SpanTermXB/SpanTerm /SpanNear SpanOr /Include Exclude SpanTerm_ADDRESS_/SpanTerm /Exclude /SpanNot Steve On 03/29/2010 at 9:11 AM, David Smiley (@MITRE.org) wrote: Sorry, I intended to design my post so that one wouldn't have to read the thread for context but it seems I failed to do that. Don't bother reading the thread. The use-case I'm pondering modifying Lucene/Solr to solve is the one-to-many problem. Imagine a document that contains multiple addresses where each field of an address (like street, state, zipcode) go in different multi-valued fields. The main difficulty is considering how Lucene might be modified to have query results across different fields be intersected by a matching term position offset (which is designed in these fields to refer to a known value offset). Following the link you gave is interesting though the general case I'm talking about doesn't have a hierarchy. And I find the use of a single multi-valued field unpalatable for a variety of reasons. ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search- server/book -- View this message in context: http://n3.nabble.com/One-item-multiple- fields-and-range-queries-tp475030p683361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Drill down a solr result set by facets
Try adding quotes to your query: DepartmentName:Chemistry+fSponsor:\US Cancer/Diabetic Research Institute\ The parser will split on whitespace Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com On 3/29/10 8:49 AM, Dhanushka Samarakoon wrote: Hi, I'm trying to perform a search based on keywords and then reduce the result set based on facets that user selects. First query for a search would look like this. http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH In the above query (as per dismax on the solr config file) it searches multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc... Then if user select 'Chemistry' from the facet field 'fDepartmentName' and 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the result set above to only records from where fDepartmentName is 'Chemistry' and 'fSponsor' is 'US Cancer/Diabetic Research Institute' The following query is not working. select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic Research Instituteversion=2.2 Fields starting with 'f' are defined in the schema.xml as copy fields. field name=DepartmentName type=text indexed=true stored=true multiValued=true / field name=fDepartmentName type=string indexed=true stored=false multiValued=true / copyField source=DepartmentName dest=fDepartmentName/ Any ideas on the correct syntax? Thanks, Dhanushka.
Re: Drill down a solr result set by facets
Thanks for the reply. I was just giving the above as an example. Something as simple as following is also not working. /select/?q=france+fDepartmentName:Historyversion=2.2 So it looks like the query parameter syntax I'm using is wrong. This is the params array I'm getting from the result. lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=qkansas fDepartmentName:History/str str name=qtdismax/str str name=version2.2/str /lst On Mon, Mar 29, 2010 at 10:59 AM, Tommy Chheng tommy.chh...@gmail.comwrote: Try adding quotes to your query: DepartmentName:Chemistry+fSponsor:\US Cancer/Diabetic Research Institute\ The parser will split on whitespace Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com On 3/29/10 8:49 AM, Dhanushka Samarakoon wrote: Hi, I'm trying to perform a search based on keywords and then reduce the result set based on facets that user selects. First query for a search would look like this. http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH In the above query (as per dismax on the solr config file) it searches multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc... Then if user select 'Chemistry' from the facet field 'fDepartmentName' and 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the result set above to only records from where fDepartmentName is 'Chemistry' and 'fSponsor' is 'US Cancer/Diabetic Research Institute' The following query is not working. select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic Research Instituteversion=2.2 Fields starting with 'f' are defined in the schema.xml as copy fields. field name=DepartmentName type=text indexed=true stored=true multiValued=true / field name=fDepartmentName type=string indexed=true stored=false multiValued=true / copyField source=DepartmentName dest=fDepartmentName/ Any ideas on the correct syntax? Thanks, Dhanushka.
Re: Drill down a solr result set by facets
Hi Dhanushka, Have you tried to use the filter query parameter. Check out this article, the Applying Constraints section should be helpful to you. http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr Solr Wiki link to filter query parameter http://wiki.apache.org/solr/CommonQueryParameters#fq I am at the moment implementing a similar system where the user needs to drill down to the data. What I am doing now is say if the user selects Chemistry from the facet I request a query with the filter query applied to fDepartmentName and when the user selects US Cancer/Diabetic Research Institute from the fSponsor facet I will apply filter querying to both the fDepartmentName and fSponsor. Hope this helps. Regards, Indika On 29 March 2010 21:19, Dhanushka Samarakoon dhan...@gmail.com wrote: Hi, I'm trying to perform a search based on keywords and then reduce the result set based on facets that user selects. First query for a search would look like this. http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH In the above query (as per dismax on the solr config file) it searches multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc... Then if user select 'Chemistry' from the facet field 'fDepartmentName' and 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the result set above to only records from where fDepartmentName is 'Chemistry' and 'fSponsor' is 'US Cancer/Diabetic Research Institute' The following query is not working. select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic Research Instituteversion=2.2 Fields starting with 'f' are defined in the schema.xml as copy fields. field name=DepartmentName type=text indexed=true stored=true multiValued=true / field name=fDepartmentName type=string indexed=true stored=false multiValued=true / copyField source=DepartmentName dest=fDepartmentName/ Any ideas on the correct syntax? Thanks, Dhanushka.
Re: Drill down a solr result set by facets
Thanks Indika, that looks good. I'll look at the article. If anyone else has any good ideas please send them too. On Mon, Mar 29, 2010 at 11:09 AM, Indika Tantrigoda indik...@gmail.comwrote: Hi Dhanushka, Have you tried to use the filter query parameter. Check out this article, the Applying Constraints section should be helpful to you. http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr Solr Wiki link to filter query parameter http://wiki.apache.org/solr/CommonQueryParameters#fq I am at the moment implementing a similar system where the user needs to drill down to the data. What I am doing now is say if the user selects Chemistry from the facet I request a query with the filter query applied to fDepartmentName and when the user selects US Cancer/Diabetic Research Institute from the fSponsor facet I will apply filter querying to both the fDepartmentName and fSponsor. Hope this helps. Regards, Indika On 29 March 2010 21:19, Dhanushka Samarakoon dhan...@gmail.com wrote: Hi, I'm trying to perform a search based on keywords and then reduce the result set based on facets that user selects. First query for a search would look like this. http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH In the above query (as per dismax on the solr config file) it searches multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc... Then if user select 'Chemistry' from the facet field 'fDepartmentName' and 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the result set above to only records from where fDepartmentName is 'Chemistry' and 'fSponsor' is 'US Cancer/Diabetic Research Institute' The following query is not working. select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic Research Instituteversion=2.2 Fields starting with 'f' are defined in the schema.xml as copy fields. field name=DepartmentName type=text indexed=true stored=true multiValued=true / field name=fDepartmentName type=string indexed=true stored=false multiValued=true / copyField source=DepartmentName dest=fDepartmentName/ Any ideas on the correct syntax? Thanks, Dhanushka.
Re: Filter query with special character using SolrJ client
: Since the names of the string fields are not predefined I might have to : find a method to do this automatically. if the fields are strings, and you are only looking for exact matches (ie: you don't need any special query parser syntax) use the field QParser : SolrQuery.addFilterQuery(yourStringField:Cameras\\ Photos) solrQuery.addFilterQuery({!field f=yourStringField}Cameras Photos) -Hoss
Absolutely empty resultset regardless of what I am searching for
Hello guys, my analysis.jsp shows me the right results. That means, everything seems to be parsed the right way and there are some matches. However, when I try this live, there are never any matched documents. When I try out to look up whether there is anything in my index, I get the expected result - everything is indexed. What am I doing wrong here? An example looks like: select/?indent=ondebugQuery=onq=introductionstart=0rows=10 The result looks like: --- response − lst name=responseHeader int name=status0/int int name=QTime16/int − lst name=params str name=debugQueryon/str str name=indenton/str str name=start0/str str name=qintroduction/str str name=rows10/str /lst /lst result name=response numFound=0 start=0/ − lst name=debug str name=rawquerystringintroduction/str str name=querystringintroduction/str str name=parsedquerytitle:introduction/str str name=parsedquery_toStringtitle:introduction/str lst name=explain/ str name=QParserLuceneQParser/str − lst name=timing double name=time0.0/double − lst name=prepare double name=time0.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst − lst name=process double name=time0.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response Thank you! -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p683866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter query with special character using SolrJ client
It works, thanks. Just implemented the code...:):):) Could you explain what {!field f=yourStringField}Cameras Photos does. Regards, Indika On 29 March 2010 21:55, Chris Hostetter hossman_luc...@fucit.org wrote: : Since the names of the string fields are not predefined I might have to : find a method to do this automatically. if the fields are strings, and you are only looking for exact matches (ie: you don't need any special query parser syntax) use the field QParser : SolrQuery.addFilterQuery(yourStringField:Cameras\\ Photos) solrQuery.addFilterQuery({!field f=yourStringField}Cameras Photos) -Hoss
Re: Filter query with special character using SolrJ client
: It works, thanks. Just implemented the code...:):):) : : Could you explain what {!field f=yourStringField}Cameras Photos does. {!field} says that the string should be parsed using the FIeldQParser. the FieldQParser takes an 'f' local param telling it what field you want to use, and the rest of the string is the exact value you want to passed to the analyzer for thet field 'f' ... it's a query parser that supports no markup of any kind, and only produces basic PhraseQueries or TermQueries (there's also the raw QParser for when you absolutely know you only want TermQueries) ... http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers http://lucene.apache.org/solr/api/org/apache/solr/search/FieldQParserPlugin.html -Hoss
Re: jmap output help
Gentle bounce On Sun, Mar 28, 2010 at 11:31 AM, Siddhant Goel siddhantg...@gmail.comwrote: Hi everyone, The output of jmap -histo:live 27959 | head -30 is something like the following : num #instances #bytes class name -- 1:448441 180299464 [C 2: 5311 135734480 [I 3: 3623 68389720 [B 4:445669 17826760 java.lang.String 5:391739 15669560 org.apache.lucene.index.TermInfo 6:417442 13358144 org.apache.lucene.index.Term 7: 587675171496 org.apache.lucene.index.FieldsReader$LazyField 8: 329025049760 constMethodKlass 9: 329023955920 methodKlass 10: 28433512688 constantPoolKlass 11: 23973128048 [Lorg.apache.lucene.index.Term; 12:353053592 [J 13: 33044288 [Lorg.apache.lucene.index.TermInfo; 14: 556712707536 symbolKlass 15: 272822701352 [Ljava.lang.Object; 16: 28432212384 instanceKlassKlass 17: 23432132224 constantPoolCacheKlass 18: 264241056960 java.util.ArrayList 19: 164231051072 java.util.LinkedHashMap$Entry 20: 20391028944 methodDataKlass 21: 14336 917504 org.apache.lucene.document.Field 22: 29587 710088 java.lang.Integer 23: 3171 583464 java.lang.Class 24: 813 492880 [Ljava.util.HashMap$Entry; 25: 8471 474376 org.apache.lucene.search.PhraseQuery 26: 4184 402848 [[I 27: 4277 380704 [S Is it ok to assume that the top 3 entries (character/integer/byte arrays) are referring to the entries inside the solr cache? Thanks, -- - Siddhant -- - Siddhant
Re: Filter query with special character using SolrJ client
Thank you very much for the explanation. Regards, Indika On 29 March 2010 22:28, Chris Hostetter hossman_luc...@fucit.org wrote: : It works, thanks. Just implemented the code...:):):) : : Could you explain what {!field f=yourStringField}Cameras Photos does. {!field} says that the string should be parsed using the FIeldQParser. the FieldQParser takes an 'f' local param telling it what field you want to use, and the rest of the string is the exact value you want to passed to the analyzer for thet field 'f' ... it's a query parser that supports no markup of any kind, and only produces basic PhraseQueries or TermQueries (there's also the raw QParser for when you absolutely know you only want TermQueries) ... http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers http://lucene.apache.org/solr/api/org/apache/solr/search/FieldQParserPlugin.html -Hoss
Re: jmap output help
Take a heap dump and use jhat to find out for sure. Bill On Mon, Mar 29, 2010 at 1:03 PM, Siddhant Goel siddhantg...@gmail.comwrote: Gentle bounce On Sun, Mar 28, 2010 at 11:31 AM, Siddhant Goel siddhantg...@gmail.com wrote: Hi everyone, The output of jmap -histo:live 27959 | head -30 is something like the following : num #instances #bytes class name -- 1:448441 180299464 [C 2: 5311 135734480 [I 3: 3623 68389720 [B 4:445669 17826760 java.lang.String 5:391739 15669560 org.apache.lucene.index.TermInfo 6:417442 13358144 org.apache.lucene.index.Term 7: 587675171496 org.apache.lucene.index.FieldsReader$LazyField 8: 329025049760 constMethodKlass 9: 329023955920 methodKlass 10: 28433512688 constantPoolKlass 11: 23973128048 [Lorg.apache.lucene.index.Term; 12:353053592 [J 13: 33044288 [Lorg.apache.lucene.index.TermInfo; 14: 556712707536 symbolKlass 15: 272822701352 [Ljava.lang.Object; 16: 28432212384 instanceKlassKlass 17: 23432132224 constantPoolCacheKlass 18: 264241056960 java.util.ArrayList 19: 164231051072 java.util.LinkedHashMap$Entry 20: 20391028944 methodDataKlass 21: 14336 917504 org.apache.lucene.document.Field 22: 29587 710088 java.lang.Integer 23: 3171 583464 java.lang.Class 24: 813 492880 [Ljava.util.HashMap$Entry; 25: 8471 474376 org.apache.lucene.search.PhraseQuery 26: 4184 402848 [[I 27: 4277 380704 [S Is it ok to assume that the top 3 entries (character/integer/byte arrays) are referring to the entries inside the solr cache? Thanks, -- - Siddhant -- - Siddhant
Re: ReplicationHandler reports incorrect replication failures
Shawn, I was working on something very similar... Lets perhaps also create a Jira issue for this monitoring? Thanks, Jason On Fri, Mar 26, 2010 at 6:59 AM, Shawn Smith ssmit...@gmail.com wrote: We're using Solr 1.4 Java replication, which seems to be working nicely. While writing production monitors to check that replication is healthy, I think we've run into a bug in the status reporting of the ../solr/replication?command=details command. (I know it's experimental...) Our monitor parses the replication?command=details XML and checks that replication lag is reasonable by diffing the indexVersion of the master and slave indices to make sure it's within a reasonable time range. Our monitor also compares the first elements of indexReplicatedAtList and replicationFailedAtList lists to see if the last replication attempt failed. This is where we're having a problem with the monitor throwing false errors. It looks like there's a bug that causes successful replications to be considered failures. The bug is triggered immediately after a slave restarts when the slave is already in sync with the master. Each no-op replication attempt after restart is considered a failure until something on the master changes and replication has to actually do work. From the code, it looks like SnapPuller.successfulInstall starts out false on restart. If the slave starts out in sync with the master, then each no-op replication poll leaves successfulInstall set to false which makes SnapPuller.logReplicationTimeAndConfFiles log the poll as a failure. SnapPuller.successfulInstall stays false until the first time replication actually has to do something, at which point it gets set to true, and then everything is OK. Thanks, Shawn
RE: keyword query tokenizer
Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to do. I do not max scoring field per word. Instead, I want it per phrase which may be a single word or multiple words. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Friday, March 26, 2010 10:35 PM To: solr-user@lucene.apache.org Subject: Re: keyword query tokenizer : : I am curious as to why the query parser does any tokenizing? I would think : you would want control/configure this with your analyzers? : : Does anyone know the answer to this. Is there a performance gain or something? it's not about performance, it's about hte query parser syntax. whitespace is markup as far as the query parser is concerned -- just like +,-, etc.. whitespace characters are instructions for the query parsers. Essentially: unquoted whitespace is the markup that tells the query parser to create an OR query out of the chunks of input on either side of hte space (+ signifies MUST, - signifies PROHIBITED, but there is no markup to signify SHOULD) Also: if the query parser didn't chunk on whitespace queries like this... aWord aField:anotherWord ...wouldn't work in the standard query parser. You may think but i'm using dismax, why does dismax need to worry about that? but the key to remember there is that if dismax didn't split on whitespace prior to analysis, it wouldn't be able to build the DisjunctionMaxQuery's that it uses to find the max scoring field per word (which is the whole point of hte parser). -Hoss
RE: keyword query tokenizer
: Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to : do. I do not max scoring field per word. Instead, I want it per : phrase which may be a single word or multiple words. then you need to quote your enitre q param. (or escape all the white space and meta characters) : You may think but i'm using dismax, why does dismax need to worry about : : that? but the key to remember there is that if dismax didn't split on : whitespace prior to analysis, it wouldn't be able to build the : DisjunctionMaxQuery's that it uses to find the max scoring field per : word (which is the whole point of hte parser). -Hoss
Re: Absolutely empty resultset regardless of what I am searching for
: my analysis.jsp shows me the right results. That means, everything seems to : be parsed the right way and there are some matches. analysis.jsp can tell you that *if* a document is indexed with the current config, then what will the tokens look like -- but it doesn't know if there are any documents in your index, or if you changed hte ocnfig after indexing. what does /select?q=*:* return? how about /admin/luke?fl=title ? : select/?indent=ondebugQuery=onq=introductionstart=0rows=10 ... : str name=parsedquerytitle:introduction/str ... i assume title is in fact the field you expect introduction to match on? what does your schema.xml look like?, etc... http://wiki.apache.org/solr/UsingMailingLists -Hoss
RE: keyword query tokenizer
I didn't know the quotes would work. I thought it had to be escaped and I wasn't too fond of that because you have to unescape in the analysis phase. Using quotes doesn't seem so bad to me. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, March 29, 2010 11:16 AM To: solr-user@lucene.apache.org Subject: RE: keyword query tokenizer : Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to : do. I do not max scoring field per word. Instead, I want it per : phrase which may be a single word or multiple words. then you need to quote your enitre q param. (or escape all the white space and meta characters) : You may think but i'm using dismax, why does dismax need to worry about : : that? but the key to remember there is that if dismax didn't split on : whitespace prior to analysis, it wouldn't be able to build the : DisjunctionMaxQuery's that it uses to find the max scoring field per : word (which is the whole point of hte parser). -Hoss
Getting /handlers from response and dynamically removing them
This is just something that seems to come up now and then ... * - Id like to write a last-component which does something specific for a particular declared handler /handler1 for example and there is no way to determine which handler it came from @ the moment (or can it?) * - It would be nice if there was someway to dynamically update (enable/disable) handlers on the fly, specifically update handlers, Id imagine something working like the way logging currently is laid out in the admin. Any thoughts on these 2? - Jon
negative boost
Is it possible to give a negative in boost in dismax? For instance, field1^3 field2^0 field3^-0.1 Thanks, Jason
Re: Getting /handlers from response and dynamically removing them
You can get the qt parameter, at least, in your search component. What's the use case for controlling handlers enabled flag on the fly? Erik On Mar 29, 2010, at 3:02 PM, Jon Baer wrote: This is just something that seems to come up now and then ... * - Id like to write a last-component which does something specific for a particular declared handler /handler1 for example and there is no way to determine which handler it came from @ the moment (or can it?) * - It would be nice if there was someway to dynamically update (enable/disable) handlers on the fly, specifically update handlers, Id imagine something working like the way logging currently is laid out in the admin. Any thoughts on these 2? - Jon
field QParserPlugin - Help needed
Hello Experts, Could anyone please help me by directing me to some link where I can get more details on Solr's field QParserPlugin. I would be really grateful. Thankyou all, Manas
Re: field QParserPlugin - Help needed
Manas, The best you'll find is Solr's javadocs and source code itself. There's a bit on the wiki with the pointers: http://wiki.apache.org/solr/SolrPlugins#QParserPlugin Erik On Mar 29, 2010, at 3:25 PM, Nair, Manas wrote: Hello Experts, Could anyone please help me by directing me to some link where I can get more details on Solr's field QParserPlugin. I would be really grateful. Thankyou all, Manas
RE: One item, multiple fields, and range queries
I'm not going to index each address as its own document because the one-side that I have currently has loads of text and there are many addresses. Furthermore, it doesn't really address the general case of my problem statement. I'm not sure what to make of or index using a heterogeneous field schema, grouping the different doc type instances with a unique key (the one) to form a composite doc I could use the scheme you mention provided with the spanNear query but it conflates different fields into one indexed field which will mess with the scoring and make queries like range queries if there are dates involved next to impossible. This solution is really a hack workaround to a limitation in Lucene/Solr. I was hoping to start a conversation to a more truer resolution to this problem rather than these workarounds which aren't always satisfactory. ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684282.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: field QParserPlugin - Help needed
Could anyone please help me by directing me to some link where I can get more details on Solr's field QParserPlugin. Additionally Chris Hostetter's explanation: http://search-lucene.com/m/ZKrXi2VX1st
Re: Absolutely empty resultset regardless of what I am searching for
Hoss, thank you for your response. /select?q=*:* This returns results as expected. I have found the mistake, why introduction didn't match - a wrong copyfield. *rolleyes* However, this seems to bring more problems to the light: Now, the first few rows from my database seem to be searchable, but the rest is not searchable. The thing is, I have got two stored (as well as indexed) fields: ID and title. If I search for the ID of a document, which I can't find over its title, it produces a match. If I search for the title, it returns nothing. Is there any possibility to see, what is exactly indexed? Luke seems to response wrong results... since it says, that life is one of the most frequent terms (398 times) of my index, but if I search for life (sounds great, doesn't it?) it responses only ONE match. select/?q=titleProcessed:livestart=0rows=10indent=on Here is my schema.xml: Please, notice that I have done a modification: titleProcessed means the same as title from my first post. The mistake is NOT that title is now a string-type. field name=title type=string indexed=true stored=true/ field name=synonymTitle type=Synonym indexed=true stored=false/ field name=titleProcessed type=text indexed=true stored=false/ copyField source=title dest=titleProcessed/ copyField source=title dest=titleSynonym/ copyField source=title dest=titleProcessed/ fieldType name=Synonym class = solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=Synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType --- May there be a problem, because the fields are already tokenized??? Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: negative boost
Jason, don't you want field1^3 • field2^1 • field3^0.9 ? As written in Lucene in action, it's all multiplied. So negative boost means boost under 1 (and probably elsewhere) paul PS: take the log and you get this negative. Le 29-mars-10 à 21:08, Jason Chaffee a écrit : Is it possible to give a negative in boost in dismax? For instance, field1^3 field2^0 field3^-0.1 Thanks, Jason
Re: Absolutely empty resultset regardless of what I am searching for
EDIT: The shown query was not the ment one,... please, excuse me, I have tested a lot and I am a little bit confused :-). The right query is, of course: select/?q=titleProcessed:lifestart=0rows=10indent=on -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684350.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: One item, multiple fields, and range queries
Hi David, On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote: I'm not sure what to make of or index using a heterogeneous field schema, grouping the different doc type instances with a unique key (the one) to form a composite doc Lucene is schema-free - you can mix and match different document types in a single index. You could emulate this in Solr by merging the two document types and leaving blank the parts that are inapplicable to a given instance. E.g.: Address-doc-type: Field: Unique-key Field: Street Field: City ... Everything-else-doc-type: Field: Unique-key Field: Blob-o'-text ... Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ... Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ... Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ... I could use the scheme you mention provided with the spanNear query but it conflates different fields into one indexed field which will mess with the scoring and make queries like range queries if there are dates involved next to impossible. I agree, dimensional reduction can be an issue, though I'm sure there are use cases where the attendant scoring distortion would be acceptable, e.g. non-scoring filters. (Stuffing a variable number of addresses into a single document will also mess with the scoring unless you turn off norms, which is of course another form of scoring-messing.) I've seen a couple of different mentions of private SpanRangeQuery implementations on the mailing lists, so range queries likely wouldn't be a problem for long, should it become a general issue. This solution is really a hack workaround to a limitation in Lucene/Solr. I was hoping to start a conversation to a more truer resolution to this problem rather than these workarounds which aren't always satisfactory. Limitation: Solr/Lucene is not a database. Solutions: 1. Hack workaround 2. Rewrite Solr/Lucene to be a database 3. ? (fill in more truer resolution here) Good luck, Steve
Re: Absolutely empty resultset regardless of what I am searching for
Perhaps a silly question, but did you recreate your index after you made your schema changes? Or did you delete a bunch of documents in the meantime? Or do you have a unique key defined in your schema that is replacing documents? The fact that Luke is giving you unexpected results is a red flag that your index isn't in the state you *think* it's in Best Erick On Mon, Mar 29, 2010 at 1:13 PM, MitchK mitc...@web.de wrote: EDIT: The shown query was not the ment one,... please, excuse me, I have tested a lot and I am a little bit confused :-). The right query is, of course: select/?q=titleProcessed:lifestart=0rows=10indent=on -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684350.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Absolutely empty resultset regardless of what I am searching for
I was using this page: solr/admin/dataimport.jsp?handler=/dataimport To import my data from my database. I have made a few restarts of my Solr-server and I have re-imported the data a lot of times. Furthermore, I have tried to delete everything with the help of the post.jar from the tutorial. I have recognized that it deletes only a few thousands of documents, instead of emptying the whole index. This was the last thing I've done. Now I am reindexing again. I have got a unique id - called ID, it is the primary key of my database-table. Perharps I am missunderstanding your post, but what do you mean with a unique key that is replacing documents? Thank you - Mitch -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684387.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting /handlers from response and dynamically removing them
Thanks for the qt tip, I will try that. Im building a Solr installation as a small standalone and Id like to disable everything but the /select after an import has been completed. In normal situations just the master would be setup to index and the slaves are read but in this case I need to allow imports on a standalone w/ a small index and allow updates only when the handler is enabled. Also, its not possible currently to reload a handler w/o a restart correct? - Jon On Mar 29, 2010, at 3:22 PM, Erik Hatcher wrote: You can get the qt parameter, at least, in your search component. What's the use case for controlling handlers enabled flag on the fly? Erik On Mar 29, 2010, at 3:02 PM, Jon Baer wrote: This is just something that seems to come up now and then ... * - Id like to write a last-component which does something specific for a particular declared handler /handler1 for example and there is no way to determine which handler it came from @ the moment (or can it?) * - It would be nice if there was someway to dynamically update (enable/disable) handlers on the fly, specifically update handlers, Id imagine something working like the way logging currently is laid out in the admin. Any thoughts on these 2? - Jon
Re: Getting /handlers from response and dynamically removing them
: Also, its not possible currently to reload a handler w/o a restart correct? There are methods that can be used to dynamicly add/remove handlers from SolrCore -- but there are no built in adminstrtive commands to do so. -Hoss
RE: One item, multiple fields, and range queries
Steven, The composite doc idea is an interesting avenue to a solution here that I didn't think of. What's missing is code to do the group by and then do an intersection in order to get boolean AND behavior between the addresses and primary documents, and then filter out the non-primary documents. Perhaps Solr's popular field-collapsing patch would be a starting point. I realize of course that Lucene/Solr isn't a database but there is plenty of gray area in-between. Did you read my original message where I suggested perhaps a solution might lie in intersecting different queries based on common multi-value field offsets derived from matching term positions? I have no idea how far off the current codebase is to exposing enough information to make such an approach possible. ~ David Smiley From: Steven A Rowe [via Lucene] [mailto:ml-node+684371-1863547009-13...@n3.nabble.com] Sent: Monday, March 29, 2010 4:29 PM To: Smiley, David W. Subject: RE: One item, multiple fields, and range queries Hi David, On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote: I'm not sure what to make of or index using a heterogeneous field schema, grouping the different doc type instances with a unique key (the one) to form a composite doc Lucene is schema-free - you can mix and match different document types in a single index. You could emulate this in Solr by merging the two document types and leaving blank the parts that are inapplicable to a given instance. E.g.: Address-doc-type: Field: Unique-key Field: Street Field: City ... Everything-else-doc-type: Field: Unique-key Field: Blob-o'-text ... Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ... Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ... Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ... I could use the scheme you mention provided with the spanNear query but it conflates different fields into one indexed field which will mess with the scoring and make queries like range queries if there are dates involved next to impossible. I agree, dimensional reduction can be an issue, though I'm sure there are use cases where the attendant scoring distortion would be acceptable, e.g. non-scoring filters. (Stuffing a variable number of addresses into a single document will also mess with the scoring unless you turn off norms, which is of course another form of scoring-messing.) I've seen a couple of different mentions of private SpanRangeQuery implementations on the mailing lists, so range queries likely wouldn't be a problem for long, should it become a general issue. This solution is really a hack workaround to a limitation in Lucene/Solr. I was hoping to start a conversation to a more truer resolution to this problem rather than these workarounds which aren't always satisfactory. Limitation: Solr/Lucene is not a database. Solutions: 1. Hack workaround 2. Rewrite Solr/Lucene to be a database 3. ? (fill in more truer resolution here) Good luck, Steve View message @ http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684371.html To unsubscribe from RE: One item, multiple fields, and range queries, click here (link removed) ==. - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684415.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Absolutely empty resultset regardless of what I am searching for
Luke is responsing (now): My topTerms of synonyms got a frequency of up to 800.000 and my processed title gots a maximum frequency of 7... What the hell??? However, I can't search any of the top synonyms. I am able to search within the first 55 documents of my index. What might be wrong, when analysis.jsp shows the right results, but the real-index does not? -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684418.html Sent from the Solr - User mailing list archive at Nabble.com.
dataimporthandler multivalued dynamic fields
Greetings, I'm trying to use dataimporthandler to load values from a db and trying to put them into multivalued dynamic fields. It appears to work for the first value, but does not add all the values to the field. Here is the schema definition of the *_custom fields: fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer type = index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class = solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class = solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType dynamicField name=*_custom type=text_ws indexed=true stored=true multiValued=true termVectors=true/ Here is my data-config.xml file: entity name=usr query=select * from user n field name=id column=uid / field name=givenname column=givenname / field name=lastname column=lastname / field name=nickname column=nickname / field name=zipcode column=zipcode / field name=city column=city / field name=country column=country / field name=site column=site / field name=state column=state / field name=role column=role / field name=companygroup column=companygroup / field name=personalinfo column=personalinfo / field name=email column=email / entity name=customattr query=select m.dir_attr, m.attr_id from pulse_custom_attribute_metadata m entity name=customvalue query=select value from pulse_custom_attribute_values v where v.user_id='${usr.uid}' and v.attr_id=${customattr.attr_id} ` field name=${customattr.dir_attr}_custom column=value / /entity /entity /entity Does anyone know why its only importing one of the values from the db, as opposed to all of them. Thanks, Brad
RE: negative boost
Unfortunately, my results aren't quite what I want unless I use 0 on the second field. Instead, if something matches in all the fields it is elevated to the top. I only want the first field match elevated to the top and I want all first field matches to have the same weight. Next, I want all field2 matches to have the same weight, and finally, I want all field3 matches to have the same weight. But I want field1 matches to be at the top, then field 2, and finally field3. I don't care if the term is all three fields or not. Does this make sense? -Original Message- From: Paul Libbrecht [mailto:p...@activemath.org] Sent: Monday, March 29, 2010 1:10 PM To: solr-user@lucene.apache.org Subject: Re: negative boost Jason, don't you want field1^3 * field2^1 * field3^0.9 ? As written in Lucene in action, it's all multiplied. So negative boost means boost under 1 (and probably elsewhere) paul PS: take the log and you get this negative. Le 29-mars-10 à 21:08, Jason Chaffee a écrit : Is it possible to give a negative in boost in dismax? For instance, field1^3 field2^0 field3^-0.1 Thanks, Jason
RE: One item, multiple fields, and range queries
Hi David, On 03/29/2010 at 4:54 PM, David Smiley (@MITRE.org) wrote: Did you read my original message where I suggested perhaps a solution might lie in intersecting different queries based on common multi-value field offsets derived from matching term positions? I have no idea how far off the current codebase is to exposing enough information to make such an approach possible. AFAICT, your above-described solution addresses the one-to-many problem by representing multiple records within a single document via parallel arrays, one array per address-part field. The parallel array alignment is effected via alignment of position increments. What's missing from Solr/Lucene is the ability to constrain matches such that the position increment of all matching address-part fields is the same. I suspect that the Flexible Indexing branch would allow a slightly less involved index usage pattern: you could add a new term attribute that explicitly represents the record index. That way you wouldn't have to fiddle around with increment gaps and guess about maximum record size. You still need to perform the equivalent of an SQL table join across the matching address-part fields (in addition to any non-address constraints), using parallel array index equality as the join predicate. I don't know how hard it would be to implement this, but you'd need to: add the ability to express this kind of constraint in the query language; make a new Similarity implementation that could handle it; and, if you go the route of adding a new record index term attribute, add a new postings codec that handles writing/reading it. Steve
Re: Absolutely empty resultset regardless of what I am searching for
I was using TermsComponent now to make sure, what is really indexed. Well, one title-field has got only a few terms indexed (as I have mentioned earlier: it is only saving up to 55 rows of the RDBMS), while the other fields (which are based on the same filter, but with another special-word.txt) indexes every term. However, regardless which field I choose to search on, it makes no difference: every line after the 55th is unsearchable. Any suggestions would be greate! If I can't solve the problem, I will try to export the whole data as csv and try it again, although I don't think that this will help, because the stored fields store the expected values... -- View this message in context: http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684679.html Sent from the Solr - User mailing list archive at Nabble.com.
how to create this highlighter behaviour
hello *, ive been using the highlighter and been pretty happy with its results, however theres an edge case im not sure how to fix for query: amazing grace the record matched and highlighted is emamazing/em rendition of emamazing grace/em is there any way to only highlight amazing grace without using phrase queries, can i modify the highlighter components to only use terms once and to favor contiguous sections? i dont want to enforce phrase queries as sometimes i do want terms out of order highlighter but i only want each term matched highlighted once does this make sense?
RE: negative boost
: Unfortunately, my results aren't quite what I want unless I use 0 on the : second field. Instead, if something matches in all the fields it is : elevated to the top. I only want the first field match elevated to the : top and I want all first field matches to have the same weight. Next, I : want all field2 matches to have the same weight, and finally, I want all : field3 matches to have the same weight. But I want field1 matches to be : at the top, then field 2, and finally field3. I don't care if the term : is all three fields or not. try qf=field1^1+field2^100+field3^1tie=0 : Does this make sense? it does, but it kind of defeats the point of dismax. what i cited should help -- the key is to make the boosts vastly differnet scales, and eliminate the tiebreaker value. -Hoss
RE: negative boost
I understand that it defeats the reason for dismax, at least the original reason for dismax. However, if I can do it this way without having to write my own handler because I need to search multiple fields and combine the results, then it is still preferable and thus another way to leverage dismax. Thanks for the tip. I will try it. Jason -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, March 29, 2010 5:06 PM To: solr-user@lucene.apache.org Subject: RE: negative boost : Unfortunately, my results aren't quite what I want unless I use 0 on the : second field. Instead, if something matches in all the fields it is : elevated to the top. I only want the first field match elevated to the : top and I want all first field matches to have the same weight. Next, I : want all field2 matches to have the same weight, and finally, I want all : field3 matches to have the same weight. But I want field1 matches to be : at the top, then field 2, and finally field3. I don't care if the term : is all three fields or not. try qf=field1^1+field2^100+field3^1tie=0 : Does this make sense? it does, but it kind of defeats the point of dismax. what i cited should help -- the key is to make the boosts vastly differnet scales, and eliminate the tiebreaker value. -Hoss
RE: negative boost
I think the key was change the tie to 0. I had it at 0.1. Getting exactly what I want now. Big thanks for the help. -Original Message- From: Jason Chaffee [mailto:jchaf...@ebates.com] Sent: Monday, March 29, 2010 5:20 PM To: solr-user@lucene.apache.org Subject: RE: negative boost I understand that it defeats the reason for dismax, at least the original reason for dismax. However, if I can do it this way without having to write my own handler because I need to search multiple fields and combine the results, then it is still preferable and thus another way to leverage dismax. Thanks for the tip. I will try it. Jason -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, March 29, 2010 5:06 PM To: solr-user@lucene.apache.org Subject: RE: negative boost : Unfortunately, my results aren't quite what I want unless I use 0 on the : second field. Instead, if something matches in all the fields it is : elevated to the top. I only want the first field match elevated to the : top and I want all first field matches to have the same weight. Next, I : want all field2 matches to have the same weight, and finally, I want all : field3 matches to have the same weight. But I want field1 matches to be : at the top, then field 2, and finally field3. I don't care if the term : is all three fields or not. try qf=field1^1+field2^100+field3^1tie=0 : Does this make sense? it does, but it kind of defeats the point of dismax. what i cited should help -- the key is to make the boosts vastly differnet scales, and eliminate the tiebreaker value. -Hoss
Re: Solrj doesn't tell if PDF was actually parsed by Tika
Thanks! You can search for the document after you index it. On Fri, Mar 26, 2010 at 1:55 AM, Abdelhamid ABID aeh.a...@gmail.com wrote: Well done : https://issues.apache.org/jira/browse/SOLR-1847 meanwhile, is there any workaround ? On 3/26/10, Lance Norskog goks...@gmail.com wrote: Please file a bug for this on the JIRA. https://issues.apache.org/jira/secure/Dashboard.jspa On Thu, Mar 25, 2010 at 7:21 AM, Abdelhamid ABID aeh.a...@gmail.com wrote: Hi, When posting pdf files using solrj the only response we get from Solr is only server response status, but never know whether pdf was actually parsed or not, checking the log I found that some Tika wasn't able to succeed with some pdf files because of content nature (texts in images only) or are corrupted: 25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: EI 25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode GRAVE: Stop reading corrupt stream The question is how can I catch these kinds of exceptions through Solrj ? -- Elsadek -- Lance Norskog goks...@gmail.com -- Abdelhamid ABID Software Engineer- J2EE / WEB / ESB MULE -- Lance Norskog goks...@gmail.com
Re: SOLR-1316 How To Implement this autosuggest component ???
SOLR-1316 uses a much faster data structure (Ternary Search Tree), not a Lucene index. Using Ngram-based tools like the spellchecker, or your implementation is inherently slower. Netflix, for example, uses a dedicated TST server farm (their own implementation of TST) to do auto-complete. On Fri, Mar 26, 2010 at 3:32 AM, stockii st...@shopgate.com wrote: hey thx. i think the component runs so far, but i don´t see what it brings me. my first autocompletion-solution was with EdgeNGram ... and its exactly the same result ... can anyone, plese show me the advantages of the Issue-1316 ?! -- View this message in context: http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p661787.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: solr highlighting
No problem: wrapping and unwrapping escaped text can be very confusing. On Fri, Mar 26, 2010 at 6:31 AM, Niraj Aswani n.asw...@dcs.shef.ac.uk wrote: Hi Lance, apologies.. please ignore my previous mail. I'll have a look at the PatternReplaceFilter. Thanks, Niraj Niraj Aswani wrote: Hi Lance, Yes, that is once solution but wouldn't it stop people searching for something like choice in the first place? I mean, if I encode such characters at the index time, one would have to write a query like lt;choice. Am I right? Thanks, Niraj Lance Norskog wrote: To display html-markup in an html page, it has to be in entity-encoded form. So, encode the as entities in your input application, and have it indexed and stored in this format. Then, the bu are inserted as normal. This gives you the html text displayable in an html page, with all words highlightable. And add gt/lt etc. as stopwords. At this point you have the element names, attribute names and values, and text parts searchable and highlightable. If you only want the HTML syntax parts shown, the PatternReplaceFilter is your friend: with regex patterns you can pull out those values and ignore the text parts. The analysis.jsp page will make it much much easier to debug this. Good luck! On Thu, Mar 25, 2010 at 8:21 AM, Niraj Aswani n.asw...@dcs.shef.ac.uk wrote: Hi, I am using the following two parameters to highlight the hits. hl.simple.pre= + URLEncoder.encode(bu) hl.simple.post= + URLEncoder.encode(/u/b) This seems to work. However, there is a bit of trouble when the text itself contains html markup. For example, I have indexed a document with the following text in it. === something here... choice minOccurs=1 maxOccurs=unboundedxyz/choice something here.. === When I search for the keyword choice, what it does is, it inserts bu just before the word choice and /u/b immediately after the word choice. It results into something like below: buchoice/b/u minOccurs=1 maxOccurs=unboundedxyz/buchoice/u/b I would like it to be something like: lt;buchoice/b/u minOccurs=1 maxOccurs=unboundedgt;xyz/buchoice/u/bgt; Is there any way to do it such that the highlight content is encoded as HTML but the prefix and suffix are not? Thanks, Niraj When I issue a query, it returns all the corret -- Lance Norskog goks...@gmail.com
Re: Complex relational values
If 'item' is the unique document level, then this can be done with: unique id: your own design searchable text fields: foo_x: foo_y: bar_x: bar_y: The query becomes: foo_x:[100 TO *] AND foo_y:[500 TO *] Note that to search the other fields with dismax, and foo* with the standard query parser, you'll need to combine the two with the crazy multi-parser syntax. On Fri, Mar 26, 2010 at 10:49 AM, Kumaravel Kandasami kumaravel.kandas...@gmail.com wrote: I would represent each item element as a document, and each attribute as the fields of the document. if the field names are not known upfront, you could create 'dynamic fields'. Kumar _/|\_ www.saisk.com ku...@saisk.com making a profound difference with knowledge and creativity... On Fri, Mar 26, 2010 at 12:37 PM, Phil Messenger p...@miniweb.tv wrote: Hi, I need to store structured information in an index entry for use when filtering. As XML, this could be expressed as: item some_fields_that_are_searched_using_dismax / data item type=foo x=100 y=200 / item type=bar x=300 y=1000 / /data /item I want to be able to *filter* search results according to the data in the item tags - eg. show all index entries which match the expression type=foo x 100 y 500 Having a multivalued field for type, x and y doesn't seem to work here as I need to maintain the relationship between a type/x/y. I'm not sure how to approach this problem. Is writing a custom field type the preferred approach? thanks, Phil. -- Lance Norskog goks...@gmail.com
Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010
Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 21, 2010 All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 PM US EDT The first European conference dedicated to Lucene and Solr is coming to Prague from May 18-21, 2010. Apache Lucene EuroCon is running on on not-for-profit basis, with net proceeds donated back to the Apache Software Foundation. The conference is sponsored by Lucid Imagination with additional support from community and other commercial co-sponsors. Key Dates: 24 March 2010: Call For Participation Open 13 April 2010: Call For Participation Closes 16 April 2010: Speaker Acceptance/Rejection Notification 18-19 May 2010: Lucene and Solr Pre-conference Training Sessions 20-21 May 2010: Apache Lucene EuroCon This conference creates a new opportunity for the Apache Lucene/Solr community and marketplace, providing the chance to gather, learn and collaborate on the latest in Apache Lucene and Solr search technologies and what's happening in the community and ecosystem. There will be two days of Lucene and Solr training offered May 18 19, and followed by two days packed with leading edge Lucene and Solr Open Source Search content and talks by search and open source thought leaders. We are soliciting 45-minute presentations for the conference, 20-21 May 2010 in Prague. The conference and all presentations will be in English. Topics of interest include: - Lucene and Solr in the Enterprise (case studies, implementation, return on investment, etc.) - “How We Did It” Development Case Studies - Spatial/Geo search - Lucene and Solr in the Cloud - Scalability and Performance Tuning - Large Scale Search - Real Time Search - Data Integration/Data Management - Tika, Nutch and Mahout - Lucene Connectors Framework - Faceting and Categorization - Relevance in Practice - Lucene Solr for Mobile Applications - Multi-language Support - Indexing and Analysis Techniques - Advanced Topics in Lucene Solr Development
Re: Including Tika-extracted docs in a document?
Look at the 'rootEntity' attribute in the DataImportHandler, both the description and the examples: http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config It is active for all entities. It means that you can run several operations in the outer entities, then have all of their fields come together in an inner entity. You have to say 'rootEntity=false' inwards until the last entity before your main document. (No, that is not a clear explanation.) This would let you create multi-valued fields, one value from each input document. Otherwise, this is a hard one. On Fri, Mar 26, 2010 at 10:37 PM, Don Werve d...@madwombat.com wrote: Is it possible to perform Tika extraction on multiple files that are indexed as part of a single document? -- Lance Norskog goks...@gmail.com
Re: Solr not returning all documents?
Yes, this should work. It will be very slow. There is a special hack by which you can say sort=_docid_+asc (or +desc). _docid_ is a magic field name that avoids sorting the results. Pulling documents at row # 1 million should be only a little slower than pulling documents at row #0. On Mon, Mar 29, 2010 at 12:37 AM, Adrian Pemsel apem...@gmail.com wrote: Hi, As part of our application I have written a reindex task that runs through all documents in a core one by one (using *:*, a start offset and a row limit of 1) and adds them to a new core (potentially with a new schema). However, while working well for small sets this approach somehow does not seem to work for larger data sets. The Reindex task counts its offset into the old core, this count stops at about 118000 and no more documents are returned. However, numDocs says there are around 582000 documents in the old core. Am I making a wrong assumption in believing I should get all documents like this? Thanks, Adrian -- Lance Norskog goks...@gmail.com
Re: Experiences with SOLR-1797 ?
There was only one report of the problem. I just read the patch and original source and it looks right; in concurrent programming these are famous last words :) 2010/3/29 Daniel Nowak daniel.no...@rocket-internet.de: Hello, has anyone some experiences with this patch of SOLR-1797 (http://issues.apache.org/jira/browse/SOLR-1797) ? Best Regards Daniel Nowak Senior Developer Rocket Internet GmbH | Saarbrücker Straße 20/21 | 10405 Berlin | Deutschland tel: +49 30 / 559 554 66 | fax: +49 30 / 559 554 67 | skype: daniel.s.nowak mail: daniel.no...@rocket-internet.de Geschäftsführer: Frank Biedka, Dr. Florian Heinemann, Uwe Horstmann, Felix Jahn, Arnt Jeschke, Dr. Philipp Kreibohm Eingetragen beim Amtsgericht Berlin, HRB 109262 USt-ID DE256469659 -- Lance Norskog goks...@gmail.com
Re: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010
Grant, Were you going to send out the open for registration email as well? -Mike - Original Message From: Grant Ingersoll gsing...@apache.org Cc: Lucene mailing list gene...@lucene.apache.org; solr-user@lucene.apache.org; java-u...@lucene.apache.org; mahout-u...@lucene.apache.org; nutch-u...@lucene.apache.org; openrelevance-u...@lucene.apache.org; tika-u...@lucene.apache.org; pylucene-u...@lucene.apache.org; connectors-...@incubator.apache.org; lucene-net-...@lucene.apache.org Sent: Mon, March 29, 2010 6:11:58 PM Subject: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010 Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 21, 2010 All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 PM US EDT The first European conference dedicated to Lucene and Solr is coming to Prague from May 18-21, 2010. Apache Lucene EuroCon is running on on not-for-profit basis, with net proceeds donated back to the Apache Software Foundation. The conference is sponsored by Lucid Imagination with additional support from community and other commercial co-sponsors. Key Dates: 24 March 2010: Call For Participation Open 13 April 2010: Call For Participation Closes 16 April 2010: Speaker Acceptance/Rejection Notification 18-19 May 2010: Lucene and Solr Pre-conference Training Sessions 20-21 May 2010: Apache Lucene EuroCon This conference creates a new opportunity for the Apache Lucene/Solr community and marketplace, providing the chance to gather, learn and collaborate on the latest in Apache Lucene and Solr search technologies and what's happening in the community and ecosystem. There will be two days of Lucene and Solr training offered May 18 19, and followed by two days packed with leading edge Lucene and Solr Open Source Search content and talks by search and open source thought leaders. We are soliciting 45-minute presentations for the conference, 20-21 May 2010 in Prague. The conference and all presentations will be in English. Topics of interest include: - Lucene and Solr in the Enterprise (case studies, implementation, return on investment, etc.) - “How We Did It” Development Case Studies - Spatial/Geo search - Lucene and Solr in the Cloud - Scalability and Performance Tuning - Large Scale Search - Real Time Search - Data Integration/Data Management - Tika, Nutch and Mahout - Lucene Connectors Framework - Faceting and Categorization - Relevance in Practice - Lucene Solr for Mobile Applications - Multi-language Support - Indexing and Analysis Techniques - Advanced Topics in Lucene Solr Development
Re: SOLR-1316 How To Implement this autosuggest component ???
Reading through this thread and SOLR-1316, there seems to be a lot of different ways to implement auto-complete in Solr. I've seen the mentions of: EdgeNGrams TermsComponent Faceting TST Patricia Tries RadixTree DAWG Which algorthm does SOLR-1316 implement? TST is one. There are others mentioned in the comments on SOLR-1316, such as Patricia Tries, RadixTree, DAWG. Are those implemented too? Among all those methods is there a recommended one? What are the pros cons? Thanks. --- On Mon, 3/29/10, Lance Norskog goks...@gmail.com wrote: From: Lance Norskog goks...@gmail.com Subject: Re: SOLR-1316 How To Implement this autosuggest component ??? To: solr-user@lucene.apache.org Date: Monday, March 29, 2010, 8:57 PM SOLR-1316 uses a much faster data structure (Ternary Search Tree), not a Lucene index. Using Ngram-based tools like the spellchecker, or your implementation is inherently slower. Netflix, for example, uses a dedicated TST server farm (their own implementation of TST) to do auto-complete. On Fri, Mar 26, 2010 at 3:32 AM, stockii st...@shopgate.com wrote: hey thx. i think the component runs so far, but i don´t see what it brings me. my first autocompletion-solution was with EdgeNGram ... and its exactly the same result ... can anyone, plese show me the advantages of the Issue-1316 ?! -- View this message in context: http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p661787.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Optimize after delta-import (DIH)
According to the wiki: http://wiki.apache.org/solr/DataImportHandler#Commands the delta-import command will accept the same clean, commit and optimize parameters that the full-import command takes but I am my index keeps saying its not optimized. [java] INFO: [items] webapp=/solr path=/dataimport params={optimize=trueclean=truecommit=truecommand=delta-import} status=0 QTime=1 Also can someone explain to me exactly what the clean command does? The wiki states: Tells whether to clean up the index before the indexing is started but thats kind of vague. What does it actually do? Thanks -- View this message in context: http://n3.nabble.com/Optimize-after-delta-import-DIH-tp685147p685147.html Sent from the Solr - User mailing list archive at Nabble.com.