Re: Guide to using SolrQuery object
You'll find the available parameters in various interfaces in the package org.apache.solr.common.params.* For instance: import org.apache.solr.common.params.FacetParams; import org.apache.solr.common.params.ShardParams; import org.apache.solr.common.params.TermVectorParams; As a side note to what Shalin said, SolrQuery extends ModifiableSolrParams (just so that you are aware of that). Hope that helps a bit. Cheers, Aleks On Tue, 14 Jul 2009 16:27:50 +0200, Reuben Firmin wrote: Also, are there enums or constants around the various param names that can be passed in, or do people tend to define those themselves? Thanks! Reuben -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Query on date fields
Hello, for this you can simply use the nifty date functions supplied by SOLR (given that you have indexed your fields with the solr Date field. If I understand you correctly, you can achieve what you want with the following union query: displayStartDate:[* TO NOW] AND displayEndDate:[NOW TO *] Cheers, Aleksander On Mon, 08 Jun 2009 09:17:26 +0200, prerna07 wrote: Hi, I have two date attributes in my Indexes: DisplayStartDate_dt DisplayEndDate_dt I need to fetch results where today's date lies between displayStartDate and dislayEndDate. However i cannot send hardcoded displayStartdate and displayEndDate date in query as there are 1000 different dates in indexes Please suggest the query. Thanks, Prerna -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Configure Collection Distribution in Solr 1.3
As some people have mentioned here on this mailing lists, the solr 1.3 distribution scripts (snappuller / shooter) etc do not work on windows. Some have indicated that it might be possible to use cygwin but I have doubts. So unfortunately, windows users suffers with regard to replication (although I would reccommend everyone to use Unix for running servers;) ) That being said, you can use Solr 1.4 (one of the nightly builds) where you get built-in replication that is easily configured through the solr server configuration, and this works on Windows aswell! So, if you don't have any real reason to not upgrade, I suggest that you try out Solr 1.4 (which also gives lots of new features and major improvements!) Cheers, Aleksander On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR wrote: Hi Aleksander , I gone thorugh the below links and successfully configured rsync using cygwin on windows xp. In Solr documentation they mentioned many script files like rysnc-enable, snapshooter..etc. These all UNIX based files scripts. where do I get these script files for windows OS ? Any help on this would be great helpful. Thanks MaheshR. Aleksander M. Stensby wrote: You'll find everything you need in the Wiki. http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline http://wiki.apache.org/solr/SolrCollectionDistributionScripts If things are still uncertain I've written a guide for when we used the solr distribution scrips on our lucene index earlier. You can read that guide here: http://www.integrasco.no/index.php?option=com_content&view=article&id=51:lucene-index-replication&catid=35:blog&Itemid=53 Cheers, Aleksander On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR wrote: Hi, we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet container. Its working great. Now I need to configure collection Distribution to replicate indexing data between master and 2 slaves. Please provide me step by step instructions to configure collection distribution between master and slaves would be helpful. Thanks in advance. Thanks Mahesh. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Well yes:) Since Solr do infact support the entire lucene query parser syntax:) - Aleks On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh wrote: Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby < aleksander.sten...@integrasco.no> wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun wrote: Hi all, I have my document like this: Solr web service Is there any ways that I can search like startswith: "So* We*" : found "Sol*": found "We*": not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun wrote: Hi all, I have my document like this: Solr web service Is there any ways that I can search like startswith: "So* We*" : found "Sol*": found "We*": not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Sharding strategy
Hi Otis, thanks for your reply! You could say I'm lucky (and I totally agree since I've made the choice of ordering the data that way:p). What you describe is what I've thought about doing and I'm happy to read that you approve. It is always nice to know that you are not doing things completely off - that's what I love about this mailing list! I've implemented a sharded "yellow pages" that builds up the shard parameter and it will obviously be easy to search in two shards to overcome the beginning of the year situation, just thought it might be a bit stupid to search for 1% of the data in the "latest shard" and the rest in shard n-1. How much of a performance decrease do you recon I will get from searching two shards instead of one? Anyways, thanks for confirming things, Otis! Cheers, Aleksander On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic wrote: Aleksander, In a sense you are lucky you have time-ordered data. That makes it very easy to shard and cheaper to search - you know exactly which shards you need to query. The beginning of the year situation should also be easy. Do start with the latest shard for the current year, and go to next shard only if you have to (e.g. if you don't get enough results from the first shard). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Aleksander M. Stensby To: "solr-user@lucene.apache.org" Sent: Tuesday, June 9, 2009 7:07:47 AM Subject: Sharding strategy Hi all, I'm trying to figure out how to shard our index as it is growing rapidly and we want to make our solution scalable. So, we have documents that are most commonly sorted by their date. My initial thought is to shard the index by date, but I wonder if you have any input on this and how to best solve this... I know that the most frequent queries will be executed against the "latest" shard, but then let's say we shard by year, how do we best solve the situation that will occur in the beginning of a new year? (Some of the data will be in the last shard, but most of it will be on the second last shard.) Would it be stupid to have a "latest" shard with duplicate data (always consisting of the last 6 months or something like that) and maintain that index in addition to the regular yearly shards? Any one else facing a similar situation with a good solution? Any input would be greatly appreciated :) Cheers, Aleksander --Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Sharding strategy
Hi all, I'm trying to figure out how to shard our index as it is growing rapidly and we want to make our solution scalable. So, we have documents that are most commonly sorted by their date. My initial thought is to shard the index by date, but I wonder if you have any input on this and how to best solve this... I know that the most frequent queries will be executed against the "latest" shard, but then let's say we shard by year, how do we best solve the situation that will occur in the beginning of a new year? (Some of the data will be in the last shard, but most of it will be on the second last shard.) Would it be stupid to have a "latest" shard with duplicate data (always consisting of the last 6 months or something like that) and maintain that index in addition to the regular yearly shards? Any one else facing a similar situation with a good solution? Any input would be greatly appreciated :) Cheers, Aleksander -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Multiple queries in one, something similar to a SQL "union"
I don't know if I follow you correctly, but you are saying that you want X results per type? So you do something like limit=X and query = type:Y etc. and merge the results? - Aleks On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh wrote: I have an index with two fields - name and type. I need to perform a search on the name field so that *equal number of results are fetched for each type *. Currently, I am achieving this by firing multiple queries with a different type and then merging the results. In my database driven version, I used to do a "union" of multiple queries (and not separate SQL queries) to achieve this. Can Solr do something similar? If not, can this be a possible enhancement? Cheers Avlesh -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Solr Multiple Queries?
Hi there Samnang! Please see inline for comments: On Tue, 09 Jun 2009 08:40:02 +0200, Samnang Chhun wrote: Hi all, I just get started looking at using Solr as my search web service. But I don't know does Solr have some features for multiple queries: - Startswith This is what we call prefix queries and wild card queries. For instance, you want something that starts with "man", you can search for man* - Exact Match Exact matching is done with apostrophes; "Solr rocks" - Contain Hmm, what do you mean by contain? Inside a given word? That might be a bit more tricky. We have an issue open at the moment for supporting leading wildcards, and that might allow for you to search for *cogn* and match recognition etc. If that was what you meant, you can look at the ongoing issue http://issues.apache.org/jira/browse/SOLR-218 - Doesn't Contain NOT or - are keywords to exclude something (solr supports all the boolean operators that Lucene supports). - In the range range queries in solr are done by using brackets. for instance price:[500 TO 1000] will return all results with prices ranging from 500 to 1000. There is a lot of information on the Wiki that you should check out: http://wiki.apache.org/solr/ Could anyone guide me how to implement those features in Solr? Cheers, Samnang Cheers, Aleks -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Configure Collection Distribution in Solr 1.3
You'll find everything you need in the Wiki. http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline http://wiki.apache.org/solr/SolrCollectionDistributionScripts If things are still uncertain I've written a guide for when we used the solr distribution scrips on our lucene index earlier. You can read that guide here: http://www.integrasco.no/index.php?option=com_content&view=article&id=51:lucene-index-replication&catid=35:blog&Itemid=53 Cheers, Aleksander On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR wrote: Hi, we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet container. Its working great. Now I need to configure collection Distribution to replicate indexing data between master and 2 slaves. Please provide me step by step instructions to configure collection distribution between master and slaves would be helpful. Thanks in advance. Thanks Mahesh. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Terms Component
You can try out the nightly build of solr (which is the solr 1.4 dev version) containing all the new nice and shiny features of Solr 1.4:) To use Terms Component you simply need to configure the handler as explained in the documentation / wiki. Cheers, Aleksander On Mon, 08 Jun 2009 14:22:15 +0200, Anshuman Manur wrote: while on the subject, can anybody tell me when Solr 1.4 might come out? Thanks Anshuman Manur On Mon, Jun 8, 2009 at 5:37 PM, Anshuman Manur wrote: I'm using Solr 1.3 apparently.and Solr 1.4 is not out yet. Sorry..My mistake! On Mon, Jun 8, 2009 at 5:18 PM, Anshuman Manur < anshuman_ma...@stragure.com> wrote: Hello, I want to use the terms component in Solr 1.4: But http://localhost:8983/solr/terms?terms.fl=name But, I get the following error with the above query: java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:88) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:148) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:84) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.ofbiz.catalina.container.CrossSubdomainSessionValve.invoke(CrossSubdomainSessionValve.java:44) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Any help would be great. Thanks Anshuman Manur -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
StreamingUpdateSolrServer recommendations?
Hi all, I guess this questions i mainly aimed to you, Ryan. I've been trying out your StreamingUpdateSolrServer implementation for indexin, and clearly see the improvements in indexing-times compared to the CommonsHttpSolrServer :) Great work! My question is, do you have any recommendations as to what values I should use / have you found a "sweet-spot"? What are the trade-offs? Thread count is obvious with regard to the number of cpus available, but what about the queue size? Any thoughts? I tried 20 / 3 as you have posted in the issue thread, and get averages of about 80 documents / sec (and I have not optimized the document processing etc, which takes the larger part of the time). Anyways, I was just curious on what others are using (and what times you are getting at) Keep up the good work! Aleks -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Initialising of CommonsHttpSolrServer in Spring framwork
Out of the box, the simplest way to configure CommonsHttpSolrServer through a spring application context is to simply define the bean for the server and inject it into whatever class you have that will use it, like Avlesh shared below. class="org.apache.solr.client.solrj.impl.CommonsHttpSolrServer" > http://localhost:8080/solr/core0 You can also set the connection parameters like Avlesh did with the HttpClient in the context, or directly in the init method of your implementation. Inject it with a property: A bit more tricky with the embedded solr server since you need to also register cores etc. We solved that by creating a core configuration loader class. - Aleks On Sat, 09 May 2009 03:08:25 +0200, Avlesh Singh wrote: I am giving you a detailed sample of my spring usage. class="org.apache.commons.httpclient.HttpClient"> http://localhost/solr/core1"/> http://localhost/solr/core2"/> Hope this helps. Cheers Avlesh On Sat, May 9, 2009 at 12:39 AM, sachin78 wrote: Ranjeeth, Did you figured aout how to do this? If yes, can you share with me how you did it? Example bean definition in xml will be helpful. --Sachin Funtick wrote: > > Use constructor and pass URL parameter. Nothing SPRING related... > > Create a Spring bean with attributes 'MySolr', 'MySolrUrl', and 'init' > method... 'init' will create instance of CommonsHttpSolrServer. Configure > Spring... > > > >> I am using Solr 1.3 and Solrj as a Java Client. I am >> Integarating Solrj in Spring framwork, I am facing a problem, >> Spring framework is not inializing CommonsHttpSolrServer >> class, how can I define this class to get the instance of >> SolrServer to invoke furthur method on this. >> > > > -- View this message in context: http://www.nabble.com/Initialising-of-CommonsHttpSolrServer-in-Spring-framwork-tp18808743p23451795.html Sent from the Solr - User mailing list archive at Nabble.com. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: How do I accomplish this (semi-)complicated setup?
/removed, what's the best way to >> >> keep that in sync? >> >> >> >> 2. In the event that a repository that is private, is made public, how >> >> easy would it be to run an "UPDATE" so to speak? >> >> >> >> >> >> Jesper >> >> >> >> > On Mar 25, 2009, at 12:52 PM, Jesper Nøhr wrote: >> >> > >> >> >> Hi list, >> >> >> >> >> >> I've finally settled on Solr, seeing as it has almost everything I >> >> >> could want out of the box. >> >> >> >> >> >> My setup is a complicated one. It will serve as the search backend on >> >> >> Bitbucket.org, a mercurial hosting site. We have literally thousands >> >> >> of code repositories, as well as users and other data. All this needs >> >> >> to be indexed. >> >> >> >> >> >> The complication comes in when we have private repositories. Only >> >> >> select users have access to these, but we still need to index them. >> >> >> >> >> >> How would I go about accomplishing this? I can't think of a clean way >> to >> >> >> do it. >> >> >> >> >> >> Any pointers much appreciated. >> >> >> >> >> >> >> >> >> Jesper >> >> > >> >> > - >> >> > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | >> >> > http://www.opensourceconnections.com >> >> > Free/Busy: http://tinyurl.com/eric-cal >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >> > >> > -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no Please consider the environment before printing all or any of this e-mail
Re: Solrj: Getting response attributes from QueryResponse
Hello there Mark! With SolrJ, you can simply do the following: server.query(q) returns QueryResponse the queryResponse has the method getResults() which returns SolrDocumentList. This is an extended list containing SolrDocuments, but it also exposes methods such as getNumFound(), which is exactly what you are looking for! so, you could do something like this: int hits = solrServer.query(q).getResults().getNumFound(); and you have similar methods for the other attributes, like: results.getMaxScore(); and results.getStart(); Hope that helps. Cheers, and merry Christmas! Aleks On Fri, 19 Dec 2008 21:22:48 +0100, Mark Ferguson wrote: Hello, I am trying to get the numFound attribute from a returned QueryResponse object, but for the life of me I can't find where it is stored. When I view a response in XML format, it is stored as an attribute on the response node, e.g.: However, I can't find a way to retrieve these attributes (numFound, start and maxScore). When I look at the QueryResponse itself, I can see that the attributes are being stored somewhere, because the toString method returns them. For example, queryResponse.toString() returns: {responseHeader={status=0,QTime=139,params={wt=javabin,hl=true,rows=15,version=2.2,fl=urlmd5,start=0,q=java}},response={ *numFound=1228*,start=03.633028,docs=[SolrDocument[{urlmd5=... The problem is that when I call queryResponse.get('response'), all I get is the list of SolrDocuments, I don't have any other attributes. Am I missing something or are these attributes just not publically available? If they're not, shouldn't they be? Thanks a lot, Mark Ferguson -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no Please consider the environment before printing all or any of this e-mail
TermVectorComponent and SolrJ
Hello everyone, I've started to look at TermVectorComponent and I'm experimenting with the use of the component in a sort of "top terms" setting for a given query... Was also looking at mlt and the interestingTerms, but I would like to do a query, get say 10k results, and from those results return a list of "top 10 terms" or something similar... Haven't really thought too much about it yet, but I was wondering if anyone have done any work on making the term vector response available in a simple manner with solrj yet? Or if this is planned? (In the same sense as it is today with facets (response.getFacetFields() etc..). Not that I cant manage to write it myself, but I would recon that more people than me would be interessted in this. I'd be more than happy to contribute if it is wanted, just wanted to check if anyone have started on this already or not. Cheers, Aleks -- Aleksander M. Stensby Senior software developer Integrasco A/S Please consider the environment before printing all or any of this e-mail
Re: What are the scenarios when a new Searcher is created ?
When adding documents to solr, the searcher will not be replaced, but once you do a commit, (dependening on settings) a new searcher will be opened and warmed up while the old searcher will still be open and used when searching. Once the new searcher has finished its warmup procedure, the old searcher will be replaced with the new warmed searcher, which will now allow you to search the newest documents added to the index. - Aleks On Mon, 01 Dec 2008 01:32:05 +0100, souravm <[EMAIL PROTECTED]> wrote: Hi All, Say I have started a new Solr server instance using the start.jar in java command. Now for this Solr server instance when all a new Searcher would be created ? I am aware of following scenarios - 1. When the instance is started for autowarming a new Searcher is created. But not sure whether this searcher will continue to be alive or will die after the autowarming is over. 2. When I do the first search in this server instance through select, a new searcher would be created and then onwards the same searcher would be used for all select to this instance. Even if I run multiple search request concurrently I see that the same Searcher is used to service those requests. 3. When I try to add an index to this instance through update statement a new searcher is created. Please let me know if there are any other situation when a new Searcher is created. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
Hi again Patrick. Glad to hear that we can contribute to help you guys. Thats what this mailing list is for:) First of all, I think you use the wrong parameter to get your terms. Take a look at http://lucene.apache.org/solr/api/org/apache/solr/common/params/MoreLikeThisParams.html to see the supported params. In your string you use mlt.displayTerms=list, which i believe should be mlt.interestingTerms=list. If that doesn't work: One thing you should know is that from what i can tell, you are using the StandardRequestHandler in your querying. The StandardRequestHandler supports a simplified handling of more like these queries, namely; "This method returns similar documents for each document in the response set." it supports the common mlt parameters, needs mlt=true (as you have done) and supports a mlt.count parameter to specify the number of similar documents returned for each matching doc from your query. If you want to get the "top keywords" etc, (and in essence your mlt.interestingTerms=list parameter to have any effect at all, if I'm not completely wrong), you will need to configure up a MoreLikeThisHandler in your solrconfig.xml and then map that to your query. From the sample configuration file: incoming queries will be dispatched to the correct handler based on the path or the qt (query type) param. Names starting with a '/' are accessed with the a path equal to the registered name. Names without a leading '/' are accessed with: http://host/app/select?qt=name If no qt is defined, the requestHandler that declares default="true" will be used. You can read about the MoreLikeThisHandler here: http://wiki.apache.org/solr/MoreLikeThisHandler Once you have it configured properly your query would be something like: http://localhost:8983/solr/mlt?q=amsterdam&mlt.fl=text&mlt.interestingTerms=list&mlt=true (don't think you need the mlt=true here tho...) or http://localhost:8983/solr/select?qt=mlt&q=amsterdam&mlt.fl=text&mlt.interestingTerms=list&mlt=true (in the last example I use qt=mlt) Hope this helps. Regards, Aleksander On Thu, 27 Nov 2008 11:49:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: Hi Aleksander, With all the help of you and the other comments, we're now at a point where a MoreLikeThis list is returned, and shows 10 related records. However on the query executed there are no keywords whatsoever being returned. Is the querystring still wrong or is something else required? The querystring we're currently executing is: http://suempnr3:8080/solr/select/?q=amsterdam&mlt.fl=text&mlt.displayTerms=list&mlt=true Best, Patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 15:07 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Ah, yes, That is important. In lucene, the MLT will see if the term vector is stored, and if it is not it will still be able to perform the querying, but in a much much much less efficient way.. Lucene will analyze the document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to limit the number of tokens that will be parsed). (don't want to go into details on this since I haven't really dug through the code:p) But when the field isn't stored either, it is rather difficult to re-analyze the document;) On a general note, if you want to "really" understand how the MLT works, take a look at the wiki or read this thorough blog post: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ Regards, Aleksander On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: Hi Aleksander, This was a typo on my end, the original query included a semicolon instead of an equal sign. But I think it has to do with my field not being stored and not being identified as termVectors="true". I'm recreating the index now, and see if this fixes the problem. Best, patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 14:37 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called "id", you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the &debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote:
Re: facet.sort and distributed search
This is a known issue but take a look at the following jira issue and the patch supplied there: https://issues.apache.org/jira/browse/SOLR-764 Haven't tried it myself, but i believe it should do the trick for you. Hope that helps. Cheers, Aleksander On Wed, 26 Nov 2008 22:53:21 +0100, Grégoire Neuville <[EMAIL PROTECTED]> wrote: Hi, I'm working on an web application one functionality of which consists in presenting to the user a list of terms to seize in a form field, sorted alphabetically. As long as one single index was concerned, I used solr facets to produce the list and it worked fine. But I must now deal with several indices, and thus use the distributed search capability of solr, which forbid the use of "facet.sort=false". I would like to know if someone plans to, or is even working on, the implementation of the natural facet sorting in case of a distributed search. Thanks a lot, -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Can a lucene document be used in solr?
Hello there, do you mean a lucene Document or do you mean if it is possible to use an existing lucene index with solr? In the latter case, the answer is yes, since solr is built on top of lucene. But it requires you to configure your schema.xml to correlate to the index-structure of your existing lucene index. On the question of document, Solr will take what is called a SolrInputDocument as input if you are using solrj, or xml if you are using http. Don't know if that answered your question or not.. Regards, Aleksander On Thu, 27 Nov 2008 05:55:06 +0100, Sajith Vimukthi <[EMAIL PROTECTED]> wrote: Hi all, Can someone of you all tell me whether I can use a lucene document in solr? Regards, Sajith Vimukthi Weerakoon Associate Software Engineer | ZONE24X7 | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 | http://www.zone24x7.com -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
Ah, yes, That is important. In lucene, the MLT will see if the term vector is stored, and if it is not it will still be able to perform the querying, but in a much much much less efficient way.. Lucene will analyze the document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to limit the number of tokens that will be parsed). (don't want to go into details on this since I haven't really dug through the code:p) But when the field isn't stored either, it is rather difficult to re-analyze the document;) On a general note, if you want to "really" understand how the MLT works, take a look at the wiki or read this thorough blog post: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ Regards, Aleksander On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: Hi Aleksander, This was a typo on my end, the original query included a semicolon instead of an equal sign. But I think it has to do with my field not being stored and not being identified as termVectors="true". I'm recreating the index now, and see if this fixes the problem. Best, patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 14:37 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called "id", you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the &debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: Hi Aleksander, Thanx for clearing this up. I am confident that this is a way to explore for me as I'm just starting to grasp the matter. Do you know why I'm not getting any results with the query posted earlier then? It gives me the folowing only: Instead of delivering details of the interestingTerms. Thanks in advance Patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 13:03 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil ar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2 2ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with "most frequent terms" from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (m
Re: Keyword extraction
I'm sure that for certain problems and cases you will need to do quite a bit tweaking to make it work (to suite your needs), but i responded to your statement because you made it sound like the MoreLikeThis component does not work at all for its purpuse, while it actually do work as intended and can be of great aid in constructing queries to retrieve same-topic-documents etc. - Aleksander On Wed, 26 Nov 2008 14:10:57 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> wrote: Yes, I totally understand, and agree. MoreLikeThis uses TF-IDF to rank terms, then it generates queries based on top ranked terms. In any case, I wasn't able to make it work after many attempts. Finally, I've used a different method for queries generation, and it works better, or at least gives some results, while with moreLikeThis results were poor or no result at all. To mention that my index was composed by short length documents, therefore the intersection between top ranked terms by TF-IDF was empty set. MoreLikeThis works better when you have long documents. Yes, I've changed the thresholds for min TFIDF and max TFIDF, and others parameters. I've also used "mlt.maxqt" parameter to increase the number of terms used in queries generation, but still didn't work well, since the method of queries generation based on terms with the highest TF-IDF score doesn't generate representative query for document. I wasn't able to tune it. For a low value such as mlt.maxqt=3,4, results were poor, while for mlt.maxqt=5,6>>> it gave too many and irrelevant results. Thank you, Best Wishes, Vitalie Scurtu --- On Wed, 11/26/08, Aleksander M. Stensby <[EMAIL PROTECTED]> wrote: From: Aleksander M. Stensby Subject: Re: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 1:03 PM I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with "most frequent terms" from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope it helps, Best Regards, Vitalie Scurtu --- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: From: Plaatje, Patrick <[EMAIL PROTECTED]> Subject: RE: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 10:52 AM Hi All, as an addition to my previous post, no interestingTerms are returned when i execute the folowing url: http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.interes tingTerms=list&mlt=true&mlt.match.include=true I get a moreLikeThis list though, any thoughts? Best, Patrick --Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called "id", you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the &debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: Hi Aleksander, Thanx for clearing this up. I am confident that this is a way to explore for me as I'm just starting to grasp the matter. Do you know why I'm not getting any results with the query posted earlier then? It gives me the folowing only: Instead of delivering details of the interestingTerms. Thanks in advance Patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 13:03 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with "most frequent terms" from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope it helps, Best Regards, Vitalie Scurtu --- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: From: Plaatje, Patrick <[EMAIL PROTECTED]> Subject: RE: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 10:52 AM Hi All, as an addition to my previous post, no interestingTerms are returned when i execute the folowing url: http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.inter es tingTerms=list&mlt=true&mlt.match.include=true I get a moreLikeThis list though, any thoughts? Best, Patrick -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with "most frequent terms" from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope it helps, Best Regards, Vitalie Scurtu --- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: From: Plaatje, Patrick <[EMAIL PROTECTED]> Subject: RE: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 10:52 AM Hi All, as an addition to my previous post, no interestingTerms are returned when i execute the folowing url: http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.interes tingTerms=list&mlt=true&mlt.match.include=true I get a moreLikeThis list though, any thoughts? Best, Patrick -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Query for Distributed search -
If you for instance use SolrJ and the HttpSolrServer, you could for instance add logic to your querying making your searches more efficient! That is partially the idea of sharding, right? :) So if the user wants to search for a log file in June, your application knows that June logs are stored on the second box, and hence will redirect the search to that box. Alternatively if he wants to search for logs spanning two boxes, you merely add the shards parameter to your query and just include the path to those to shards in question. I'm not really sure about how solr handles the merging of results etc and wether or not the requests are done in paralell or sequentially, but I do know that you could easily manage this on your own through java if you want to. (Simply setting up one HttpSolrServer in your code for each shard, and searching them in parallell in separate threads. => then reducing the results afterwards). Have a look at http://wiki.apache.org/solr/DistributedSearch for more info. You could also take a look at Hadoop. (http://hadoop.apache.org/) regards, Aleks On Mon, 24 Nov 2008 06:24:51 +0100, souravm <[EMAIL PROTECTED]> wrote: Hi, Looking for some insight on distributed search. Say I have an index distributed in 3 boxes and the index contains time and text data (typical log file). Each box has index for different timeline - say Box 1 for all Jan to April, Box 2 for May to August and Box 3 for Sep to Dec. Now if I try to search for a text string, will the search would happen in parallel in all 3 boxes or sequentially? Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Unique id
I still don't understand why you want two different indexes if you want to return the linked information each time anyways... I would say the easiest way is just to index all data (all columns from your views) into the index like this: taskid - taskname - start - end - personid - deptid - ismanager then you can just search like I already explained earlier. This way, you have already joined by queue-id when you insert it into the index and thus you get both results from one single search. (if you also want to have the ability to search on the queueID, just add a column for that. In general, your questions doesn't really have anything to do with solr, but architecture, db-design and what you want to search on. - A. 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have On Fri, 21 Nov 2008 11:59:56 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Can you also let me know how I join two search indices in one query? That means, in this case I have two diff search indices and I need to join by queueId and get all the tasks in one SolrQuery. I am creating queries in Solrj. -Original Message- From: Raghunandan Rao [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:45 PM To: solr-user@lucene.apache.org Subject: RE: Unique id Ok. I got your point. So I need not require ID field in the second view. I will hence remove required="true" in schema.xml. What I thought was unique ID makes indexing easier or used to maintain doc. Thanks a lot. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:36 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Well, In that case, what do you want to search for? If I were you, I would make my index consist of tasks (and I assume that is what you are trying to do). So why don't you just use your schema.xml as you have right now, and do the following: Pick a person (let's say he has person_id=42 and deptId=3), get his queue of tasks, then for each task in queue do: insert into index: (id from the task), (name of the task), (id of the person), (id of the departement) an example: 3, "this is a very important task", 42, 3 4, "this one is also important", 42, 3 5, "this one is low priority", 42, 3 And then for the next person you do the same, (person_id=58 and deptId=5) insert: 6, "this is about solr", 58, 5 7, "this is about lucene", 58, 5 etc. Now you can search for all tasks in departement 5 by doing "deptId:5". If you want to search for all the tasks assigned to a specific person you just enter the query "personId:42". And you could also search for all tasks containing certain keywords by doing the query "name:solr" OR "name:lucene". Do you understand now, or is it still unclear? - Aleks On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Ok. There is common column in two views called queueId. I query second view first and get all the queueids for a person. And having queueIds I get all the ids from first view. Sorry for missing that column earlier. I think it should make sense now. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:18 PM To: solr-user@lucene.apache.org Subject: Re: Unique id And in case that wasn't clear, the reason for it failing then would obviously be because you define the id field with required="true", and you try inserting a document where this field is missing... - Aleks On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby <[EMAIL PROTECTED]> wrote: Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do s
Re: Unique id
Well, In that case, what do you want to search for? If I were you, I would make my index consist of tasks (and I assume that is what you are trying to do). So why don't you just use your schema.xml as you have right now, and do the following: Pick a person (let's say he has person_id=42 and deptId=3), get his queue of tasks, then for each task in queue do: insert into index: (id from the task), (name of the task), (id of the person), (id of the departement) an example: 3, "this is a very important task", 42, 3 4, "this one is also important", 42, 3 5, "this one is low priority", 42, 3 And then for the next person you do the same, (person_id=58 and deptId=5) insert: 6, "this is about solr", 58, 5 7, "this is about lucene", 58, 5 etc. Now you can search for all tasks in departement 5 by doing "deptId:5". If you want to search for all the tasks assigned to a specific person you just enter the query "personId:42". And you could also search for all tasks containing certain keywords by doing the query "name:solr" OR "name:lucene". Do you understand now, or is it still unclear? - Aleks On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Ok. There is common column in two views called queueId. I query second view first and get all the queueids for a person. And having queueIds I get all the ids from first view. Sorry for missing that column earlier. I think it should make sense now. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:18 PM To: solr-user@lucene.apache.org Subject: Re: Unique id And in case that wasn't clear, the reason for it failing then would obviously be because you define the id field with required="true", and you try inserting a document where this field is missing... - Aleks On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby <[EMAIL PROTECTED]> wrote: Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do something like this: Treat each view as independent views, and inserting values from each table as separate documents in the index so you would do: insert into your index: (id from the task), (name of the task), (no value), (no value) which will be ok to do (no value), (no value), (id of the person), (id of the departement) --- which makes no sense to me... So, can you clearify the relationship between the two views, and how you are thinking of inserting entries into your index? - Aleks On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: View structure is: 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have id -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 2:56 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Origin
Re: Unique id
And in case that wasn't clear, the reason for it failing then would obviously be because you define the id field with required="true", and you try inserting a document where this field is missing... - Aleks On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby <[EMAIL PROTECTED]> wrote: Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do something like this: Treat each view as independent views, and inserting values from each table as separate documents in the index so you would do: insert into your index: (id from the task), (name of the task), (no value), (no value) which will be ok to do (no value), (no value), (id of the person), (id of the departement) --- which makes no sense to me... So, can you clearify the relationship between the two views, and how you are thinking of inserting entries into your index? - Aleks On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: View structure is: 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have id -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 2:56 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 6:40 PM To: solr-user@lucene.apache.org Subject: Re: Unique id I'd suggest aggregating those three columns into a string that can serve as the Solr uniqueKey field value. Erik On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote: Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique "key" of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are the
Re: Unique id
Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do something like this: Treat each view as independent views, and inserting values from each table as separate documents in the index so you would do: insert into your index: (id from the task), (name of the task), (no value), (no value) which will be ok to do (no value), (no value), (id of the person), (id of the departement) --- which makes no sense to me... So, can you clearify the relationship between the two views, and how you are thinking of inserting entries into your index? - Aleks On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: View structure is: 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have id -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 2:56 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 6:40 PM To: solr-user@lucene.apache.org Subject: Re: Unique id I'd suggest aggregating those three columns into a string that can serve as the Solr uniqueKey field value. Erik On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote: Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique "key" of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED] wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails.
Re: solrQueryParser does not take effect - nightly build
That sounds a bit strange. Did you do the changes in the schema.xml before starting the server? Because if you change it while it is running, it will by default delete and replace the file (discarding any changes you make). In other words, make sure the server is not running, make your changes and then start up the server. Apart from that, I can't really see any reason for this to not work... - Aleks On Thu, 20 Nov 2008 22:03:30 +0100, ashokc <[EMAIL PROTECTED]> wrote: Hi, I have set but it is not taking effect. It continues to take it as OR. I am working with the latest nightly build 11/20/2008 For a querry like term1 term2 Debug shows content:term1 content:term2>/str> Bug? Thanks - ashok
Re: Unique id
Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 6:40 PM To: solr-user@lucene.apache.org Subject: Re: Unique id I'd suggest aggregating those three columns into a string that can serve as the Solr uniqueKey field value. Erik On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote: Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique "key" of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED] wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Unique id
Ok, but how do you map your table structure to the index? As far as I can understand, the two tables have different structre, so why/how do you map two different datastructures onto a single index? Are the two tables connected in some way? If so, you could make your index structure reflect the union of both tables and just make one insertion into the index per entry of the two tables. Maybe you could post the table structure so that I can get a better understanding of your use-case... - Aleks On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Ok got it. I am indexing two tables differently. I am using Solrj to index with @Field annotation. I make two queries initially and fetch the data from two tables and index them separately. But what if the ids in two tables are same? That means documents with same id will be deleted when doing update. How does this work? Please explain. Thanks. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu
Re: Unique id
Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Use SOLR like the "MySQL LIKE"
Ah, okay! Well, then I suggest you index the field in two different ways if you want both possible ways of searching. One, where you treat the entire name as one token (in lowercase) (then you can search for avera* and match on for instance "average joe" etc.) And then another field where you tokenize on whitespace for instance, if you want/need that possibility aswell. Look at the solr copy fields and try it out, it works like a charm :) Cheers, Aleksander On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L <[EMAIL PROTECTED]> wrote: Thanks for the quick reply! It is supposed to work a little like the Google Suggest or field autocompletion. I know I mentioned email and userid, but the problem lies with the name field, because of the whitespaces in combination with the wildcard. I looked at the solr.WordDelimiterFilterFactory, but it does not mention anything about whitespaces - or wildcards. A quick brushup: I would like to mimic the LIKE functionality from MySQL using the wildcards in the end of the searchquery. In MySQL whitespaces are treated as characters, not "splitters". Aleksander M. Stensby wrote: Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer "recognizes email addresses and internet hostnames as one token". In your case, I guess you want an email, say "[EMAIL PROTECTED]" to be split into four tokens: average joe apache org, or something like that, which would indeed allow you to search for "joe" or "average j*" and match. To do so, you could use the WordDelimiterFilterFactory and split on intra-word delimiters (I think the defaults here are non-alphanumeric chars). Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on tokenizers and filters. cheers, Aleks On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote: Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the "MySQL LIKE". So when a user enters the search term: "carsten", then the query looks like: "name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*)" Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: "carsten l" the query looks like: "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)" Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..." I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Use SOLR like the "MySQL LIKE"
Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer "recognizes email addresses and internet hostnames as one token". In your case, I guess you want an email, say "[EMAIL PROTECTED]" to be split into four tokens: average joe apache org, or something like that, which would indeed allow you to search for "joe" or "average j*" and match. To do so, you could use the WordDelimiterFilterFactory and split on intra-word delimiters (I think the defaults here are non-alphanumeric chars). Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on tokenizers and filters. cheers, Aleks On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote: Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the "MySQL LIKE". So when a user enters the search term: "carsten", then the query looks like: "name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*)" Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: "carsten l" the query looks like: "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)" Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..." I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Calculating peaks - solrj support for facet.date?
As Erik said, you can just set the parameters yourself SolrQuery query = new SolrQuery(...); query.set(FacetParams.FACET_DATE, ...); etc. You'll find all facet-related parameters in the FacetParams interface, located in the org.apache.solr.common.params package. - Aleks On Fri, 07 Nov 2008 14:26:56 +0100, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Nov 7, 2008, at 7:23 AM, [EMAIL PROTECTED] wrote: Sorry, but I have one more question. Does the java client solrj support facet.date? Yeah, but it doesn't have explicit setters for it. A SolrQuery is also a ModifiableSolrParams - so you can call the add/set methods on it using the same keys used with HTTP requests. Erik -- Aleksander M. Stensby Senior software developer Integrasco A/S
Re: EmbeddedSolrServer and the MultiCore functionality
Okay, sounds fair. Well, why I would have multiple shards was based on the presumption that it would be more effective to be able to search in single shards when needed (if each shard contains lets say 30 million entries) and then when time comes, migrate one of the shards to a different node. But I guess the gain in performance is not significant and that i should rather have just one shard per node. Or? Best regards and thanks for your answer, Aleksander On Tue, 23 Sep 2008 16:57:08 +0200, Ryan McKinley <[EMAIL PROTECTED]> wrote: If i have solr up and running and do something like this: query.set("shards", "localhost:8080/solr/core0,localhost: 8080/solr/core1"); I will get the results from both cores, obviously... But is there a way to do this without using shards and accessing the cores through http? I presume it would/should be possible to do the same thing directly against the cores, but my question is really if this has been implemented already / is it possible? not implemented... Check line 384 of SearchHandler.java SolrServer server = new CommonsHttpSolrServer(url, client); it defaults to CommonsHttpSolrServer. This could easily change to EmbeddedSolrServer, but i'm not sure it is a very common usecase... why would you have multiple shards on the same machine? ryan -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 [EMAIL PROTECTED]
EmbeddedSolrServer and the MultiCore functionality
Hello everyone, I'm new to Solr (have been using Lucene for a few years now). We are looking into Solr and have heard many good things about the project:) I have a few questions regarding the EmbeddedSolrServer in Solrj and the MultiCore features... I've tried to find answers to this in the archives but have not succeeded. The thing is, I want to be able to use the Embedded server to access multiple cores on one machine, and I would like to at least have the possibility to access the lucene indexes without http. In particular I'm wondering if it is possible to do the "shards" (distributed search) approach using the embedded server, without using http requests. lets say I register 2 cores to a container and init my embedded server like this: CoreContainer container = new CoreContainer(); container.register("core1", core1, false); container.register("core2", core2, false); server = new EmbeddedSolrServer(container, "core1"); then queries performed on my server will return results from core1... and if i do ..=new EmbeddedSolrServer(container, "core2") the results will come from core2. If i have solr up and running and do something like this: query.set("shards", "localhost:8080/solr/core0,localhost:8080/solr/core1"); I will get the results from both cores, obviously... But is there a way to do this without using shards and accessing the cores through http? I presume it would/should be possible to do the same thing directly against the cores, but my question is really if this has been implemented already / is it possible? Thanks in advance for any replies! Best regards, Aleksander -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 [EMAIL PROTECTED]