Re: error opening index solr 4.0 with lukeall-4.0.0-ALPHA.jar
I just downloaded, compiled and opened an optimized solr 4.0 index in read only without problems. Could browse through the docs, search with different analyzers, ... Looks good. Am 19.11.2012 08:49, schrieb Toke Eskildsen: On Mon, 2012-11-19 at 08:10 +0100, Bernd Fehling wrote: I think there is already a BETA available: http://luke.googlecode.com/svn/trunk/ You might try that one. That doesn't work either for Lucene 4.0.0 indexes, same for source trunk. I did have some luck with downloading the source and changing the dependencies to Lucene 4.0.0 final (4 or 5 JARs, AFAIR). It threw a non-fatal exception upon index open, something about subReaders not being accessible throught the metod it used (sorry for being vague, it was on my home machine and some days ago), so I'm guessing that not all functionality works. It was possible to inspect some documents and that was what I needed at the time.
RE: Reduce QueryComponent prepare time
I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus
configuring data source in apache tomcat
Hi, I have configured apche solr with tomcat for that I have deployed .war file in tomcat. I have created the solr home directory at C:\solr. And after starting tomcat solr.war file get extracted and a folder is get created in webapps. In that in WEB-INF/web.xml I had written env-entry env-entry-namesolr/home/env-entry-name env-entry-valueC:\solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry So after this solr admin is working. Now I want to configure xml data source. How can I configure xml data source.? Thanks Regards, Leena Jawale The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Reduce QueryComponent prepare time
Markus, It's hard to suggest anything until you provide a profiler snapshot which says what it spends time in prepare for. As far as I know in prepare it parses queries e.g. we have a really heavy query parsers, but I don't think it's really common. On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
CloudSolrServer or load-balancer for indexing
Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
Re: SolrCloud Error after leader restarts
Your using ram dir? Sent from my iPhone On Nov 19, 2012, at 1:21 AM, deniz denizdurmu...@gmail.com wrote: Hello, for test purposes, I am running two zookeepers on ports 2181 and 2182. and i have two solr instances running on different machines... For the one which is running on my local and acts as leader: java -Dbootstrap_conf=true -DzkHost=localhost:2181 -jar start.jar and for the one which acts as follower, on a remote machine: java -Djetty.port=7574 -DzkHost=address-of-mylocal:2182 -jar start.jar until this point everything is smooth and i can see the configs on both zookeeper hosts when i connect with zkCli.sh. just to see what happens and check recovery stuff, i have killed the solr which is running on my local and tried to index some files by using the follewer, which was failed... this is normal as writes are routed into the leader... the point that i dont understand is here: when i restart the leader with the same command on terminal, after normal logs, it start showing this Nov 19, 2012 2:15:18 PM org.apache.solr.common.SolrException log SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:151) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:405) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@1e75e89 lockFactory=org.apache.lucene.store.NativeFSLockFactory@128e909: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:639) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:75) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:191) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:77) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:354) ... 4 more Nov 19, 2012 2:15:18 PM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Replication for recovery failed. at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:154) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:405) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) it fails to recover after shutdown... why does this happen? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CloudSolrServer or load-balancer for indexing
Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there. CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup). - Mark On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
SolrCloud and exernal file fields
Hi all, I'm planning to move a quite big Solr index to SolrCloud. However, in this index, an external file field is used for popularity ranking. Does SolrCloud supports external file fields? How does it cope with sharding and replication? Where should the external file be placed now that the index folder is not local but in the cloud? Are there otherwise other best practices to deal with the use cases external file fields were used for, like popularity/ranking, in SolrCloud? Custom ValueSources going to something external? Thanks in advance, Simone
Re: Custom Solr indexer/searcher
FWIW I helped someone a few days ago about a similar problem and similarly advised modifying SpatialPrefixTree: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tt4020445.html IMO GeoHashField should be deprecated because it ads no value. ~ David On Nov 16, 2012, at 1:49 PM, Scott Smith wrote: Thanks for the suggestions. I'll take a look at these things. -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, November 15, 2012 11:54 PM To: solr-user@lucene.apache.org Subject: Re: Custom Solr indexer/searcher Scott, It sounds like you need to look into few samples of similar things in Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to the given in FST for query expansion. Generic query expansion is done via MultiTermQuery. Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it should match with your goal a lot). All these are single dimension samples, but AFAIK KD-tree is multidimensional, look into GeoHashField which puts two dimensional points into single terms with ability to build ranges on them see GeoHashField.createSpatialQuery(). Happy hacking! On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote: Scott, I probably have no idea as to what I'm saying, but if you're looking for finding results in a N-dimensional space, you might look at creating a field of type 'point'. Point-type fields have a dimension attribute; I believe that it can be set to a large integer value. Barring that, there is also a 'dist()' function that can be used to work with multiple numeric fields in order sort results based on closeness to a desired coordinate. The 'dist function takes a parameter to specify the means of calculating the distance. (For example, 2 - 'Euclidean distance'. I don't know the other options.) In the worst case, my response is worthless, but pops your question back up in the e-mails... Regards, John -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
solr cloud shards and servers issue
Hi I have the following scenario: I have 1 collection across 10 servers. Num of shards: 10. Each server has 2 solr instances running. replication is 2. I want to move one of the instances to another server. meaning, kill the solr process in server X and start a new solr process in server Y instead. When I kill the solr process in server X, I can still see that instance in the solr-cloud-graph (marked differently). When I run the instance on server Y, it get attahced to another shard, instead of getting into the shard that is now actually missing an instance. 1. Any way to tell solr/zookeeper - Forget about that instance? 2. when running a new solr instance - any way to tell solr/zookeper - add this instance to shard X? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101.html Sent from the Solr - User mailing list archive at Nabble.com.
How do I best detect when my DIH load is done?
A little while back, I needed a way to tell if my DIH load was done, so I made up a little Ruby program to query /dih?command=status . The program is here: http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/ Is this the best way to do it? Is there some other tool or interface that I should be using instead? Thanks, xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: solr cloud shards and servers issue
On Nov 19, 2012, at 11:24 AM, joe.cohe...@gmail.com wrote: Hi I have the following scenario: I have 1 collection across 10 servers. Num of shards: 10. Each server has 2 solr instances running. replication is 2. I want to move one of the instances to another server. meaning, kill the solr process in server X and start a new solr process in server Y instead. When I kill the solr process in server X, I can still see that instance in the solr-cloud-graph (marked differently). When I run the instance on server Y, it get attahced to another shard, instead of getting into the shard that is now actually missing an instance. 1. Any way to tell solr/zookeeper - Forget about that instance? Unload the SolrCores involved. 2. when running a new solr instance - any way to tell solr/zookeper - add this instance to shard X? Specify a shardId when creating the core or configuring it in solr.xml and make it match the shard you want to add to. - Mark
Order by hl.snippets count
Hello, I'm using Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters options. The client just asked us to change the order from the default score to the number of hl.snippets per document. It's this posibble from Solr configuration? (without implementing a custom scoring algorithm)? Thanks, -- *Gabriel-Cristian CROITORU* Senior Software Engineer www.zitec.com Tel. +40 (0)31 71 00 114 We are hiring! www.zitec.com/join-zitec
Re: solr cloud shards and servers issue
How can I unload a solrCore after i killed the running process? Mark Miller-3 wrote On Nov 19, 2012, at 11:24 AM, joe.cohen.m@ wrote: Hi I have the following scenario: I have 1 collection across 10 servers. Num of shards: 10. Each server has 2 solr instances running. replication is 2. I want to move one of the instances to another server. meaning, kill the solr process in server X and start a new solr process in server Y instead. When I kill the solr process in server X, I can still see that instance in the solr-cloud-graph (marked differently). When I run the instance on server Y, it get attahced to another shard, instead of getting into the shard that is now actually missing an instance. 1. Any way to tell solr/zookeeper - Forget about that instance? Unload the SolrCores involved. 2. when running a new solr instance - any way to tell solr/zookeper - add this instance to shard X? Specify a shardId when creating the core or configuring it in solr.xml and make it match the shard you want to add to. - Mark -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101p402.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: inconsistent number of results returned in solr cloud
Answers inline below -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 17, 2012 6:40 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Hmmm, first an aside. If by commit after every batch of documents you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. DB Good point. The code for committing docs to solr is fairly old. I will update it since I don't have a latency requirement. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? DB Yes, it is permanent. I have collections that have been up for weeks, and are still returning inconsistent results, and I haven't been adding any additional documents. DB Related to this, I seem to have a discrepancy between the number of documents I think I am sending to solr, and the number of documents it is reporting. I have tried reducing the number of shards for one of my small collections, so I deleted all references to this collections, and reloaded it. I think I have 260 documents submitted (counted from a hadoop job). Solr returns a count of ~430 (it varies), and the first returned document is not consistent. I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. DB These are my steps DB 1. Create the collection via the HTTP API: http://host:port/solr/admin/collections?action=CREATEname=collectionnumShards=6%20collection.configName=collection DB 2. Relaunch one of my JVM processes, bootstrapping the collection: DB java -Xmx16g -Dcollection.configName=collection -Djetty.port=port -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 -Dbootstrap_confdir=conf -jar start.jar DB load data DB Let me know if something is unclear. I can run through the process again and document it more carefully. DB DB Thanks for looking at it, DB Dave Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote: My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave
RE: Architecture Question
If you just want to store the data, you can dump it into HDFS sequence files. While HBase is really nice if you want to process and serve data real-time, it adds overhead to use it as pure storage. Dave -Original Message- From: Cool Techi [mailto:cooltec...@outlook.com] Sent: Friday, November 16, 2012 8:26 PM To: solr-user@lucene.apache.org Subject: RE: Architecture Question Hi Otis, Thanks for your reply, just wanted to check what NoSql structure would be best suited to store data and use the least amount of memory, since for most of my work Solr would be sufficient and I want to store data just in case we want to reindex and as a backup. Regards, Ayush Date: Fri, 16 Nov 2012 15:47:40 -0500 Subject: Re: Architecture Question From: otis.gospodne...@gmail.com To: solr-user@lucene.apache.org Hello, I am not sure if this is the right forum for this question, but it would be great if I could be pointed in the right direction. We have been using a combination of MySql and Solr for all our company full text and query needs. But as our customers have grow so has the amount of data and MySql is just not proving to be a right option for storing/querying. I have been looking at Solr Cloud and it looks really impressive, but and not sure if we should give away our storage system. So, I have been exploring DataStax but a commercial option is out of question. So we were thinking of using hbase to store the data and at the same time index the data into Solr cloud, but for many reasons this design doesn't seem convincing (Also seen basic of Lilly). 1) Would it be recommended to just user Solr cloud with multiple replication or hbase-solr seems like good option If you trust SolrCloud with replication and keep all your fields stored then you could live without an external DB. At this point I personally would still want an external DB. Whether HBase is the right DB for the job I can't tell because I don't know anything about your data, volume, access patterns, etc. I can tell you that HBase does scale well - we have tables with many billions of rows stored in it for instance. 2) How much strain would be to keep both Solr Shard and Hbase node on the same machine HBase loves memory. So does Solr. They both dislike disk IO (who doesn't!). Solr can use a lot of CPU for indexing/searching, depending on the volume. HBase RegionServers can use a lot of CPU if you run MapReuce on data in HBase. 3) if there a calculation on what kind of machine configuration would I need to store 500-1000 million records. Most of these with be social data (Twitter/facebook/blogs etc) and how many shards. No recipe here, unfortunately. You'd have to experiment and test, do load and performance testing, etc. If you need help with Solr + HBase, we happen to have a lot of experience with both and have even used them together for some of our clients. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html
RE: How do I best detect when my DIH load is done?
Andy, I use an approach similar to yours. There may be something better, however. You might be able to write an onImportEnd listener to tell you when it ends. See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little documentation See also https://issues.apache.org/jira/browse/SOLR-938 and https://issues.apache.org/jira/browse/SOLR-1081 for the background on this feature. If you do end up using this let us know how it works and if there is anything you could see to improve it. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Andy Lester [mailto:a...@petdance.com] Sent: Monday, November 19, 2012 10:29 AM To: solr-user@lucene.apache.org Subject: How do I best detect when my DIH load is done? A little while back, I needed a way to tell if my DIH load was done, so I made up a little Ruby program to query /dih?command=status . The program is here: http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/ Is this the best way to do it? Is there some other tool or interface that I should be using instead? Thanks, xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Search using the result returned from the spell checking component
Hi, I've successfully configured the spell check component and it works well. I couldn't find an answer to my question so any help would be much appreciated: Can i send a single request to Solr, and make it so that if any part of the query was misspelled, than the search would be performed using the first spell suggestion that returns? I want to make only one request, e.g. submit a query only once, if that is possible. For example: if a user searched for jaca than the search would be performed only once - for java. Thanks an advance for any answer or a link to a relevant resource (I couldn't find any). -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Search using the result returned from the spell checking component
What you want isn't supported. You always will need to issue that second request. This would be a nice feature to add though. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Roni [mailto:r...@socialarray.com] Sent: Monday, November 19, 2012 12:54 PM To: solr-user@lucene.apache.org Subject: Search using the result returned from the spell checking component Hi, I've successfully configured the spell check component and it works well. I couldn't find an answer to my question so any help would be much appreciated: Can i send a single request to Solr, and make it so that if any part of the query was misspelled, than the search would be performed using the first spell suggestion that returns? I want to make only one request, e.g. submit a query only once, if that is possible. For example: if a user searched for jaca than the search would be performed only once - for java. Thanks an advance for any answer or a link to a relevant resource (I couldn't find any). -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Search using the result returned from the spell checking component
Thank you. I was wondering - what if a make a first request, and ask it to return only 1 result - will it still return the spell suggestions while avoiding the overhead of returning all relevant results? Than I could make a second request to get all the results i need. Would that work? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021140.html Sent from the Solr - User mailing list archive at Nabble.com.
Cacti monitoring of Solr and Tomcat
Is anyone using Cacti to track trends over time in Solr and Tomcat metrics? We have Nagios set up for alerts, but want to track trends over time. I've found a couple of examples online, but none have worked completely for me. I'm looking at this one next: http://forums.cacti.net/viewtopic.php?f=12t=19744start=15 It looks promising although it doesn't monitor Solr itself. Suggestions? Thanks, Andy -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Search using the result returned from the spell checking component
You can even request zero rows. That will still return the number of matches. --wunder On Nov 19, 2012, at 11:12 AM, Roni wrote: Thank you. I was wondering - what if a make a first request, and ask it to return only 1 result - will it still return the spell suggestions while avoiding the overhead of returning all relevant results? Than I could make a second request to get all the results i need. Would that work?
Re: Search using the result returned from the spell checking component
And performance-wise: is asking for 0 rows the same as asking for 100 rows? On Mon, Nov 19, 2012 at 9:22 PM, Walter Underwood [via Lucene] ml-node+s472066n4021143...@n3.nabble.com wrote: You can even request zero rows. That will still return the number of matches. --wunder On Nov 19, 2012, at 11:12 AM, Roni wrote: Thank you. I was wondering - what if a make a first request, and ask it to return only 1 result - will it still return the spell suggestions while avoiding the overhead of returning all relevant results? Than I could make a second request to get all the results i need. Would that work? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021143.html To unsubscribe from Search using the result returned from the spell checking component, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4021135code=cm9uaUBzb2NpYWxhcnJheS5jb218NDAyMTEzNXwtMTQ5MzI5ODA0Mw== . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I best detect when my DIH load is done?
On 11/19/2012 11:52 AM, Dyer, James wrote: Andy, I use an approach similar to yours. There may be something better, however. You might be able to write an onImportEnd listener to tell you when it ends. See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little documentation See also https://issues.apache.org/jira/browse/SOLR-938 and https://issues.apache.org/jira/browse/SOLR-1081 for the background on this feature. If you do end up using this let us know how it works and if there is anything you could see to improve it. I think it would be a good idea to provide a SolrJ API out of the box (similar to CoreAdminRequest) for gathering the status URL from Solr and obtaining the following information: 1) Determining import status -a) never started (idle) -b) finished successful (idle) -c) finished with error, canceled, etc. (idle) -d) in progress. (busy) 2) Determining how many documents have been added. 3) Determining how long the import took or has taken so far. 4) Any other commonly gathered information. There may be some reluctance to do this simply because DIH is a contrib module. Perhaps there could be a contrib module for SolrJ? Thanks, Shawn
Can Solr v1.4 and v4.0 co-exist in Tomcat?
I have an existing v1.4 implementation of Solr that supports 2 lines of business. For a third line of business the need to do Geo searching requires using Solr 4.0. I'd like to minimize the impact to the existing lines of business (let them upgrade at their own pace), however I want to share hardware if possible. Can I have Solr 4.0 and Solr 1.4 co-exist in the same Tomcat instance? If so, are there any potential side-effects to the existing Solr implementation I should be aware of? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Can-Solr-v1-4-and-v4-0-co-exist-in-Tomcat-tp4021146.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I best detect when my DIH load is done?
Hello Andy, i had a similar question on this some time ago. http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html#a3987123 http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-td3801327.html#a3803658 i ended up writing my own shell based polling application that runs from our *nx batch server that handles all of our Control-M work. +1 on the idea of making this a more formal part of the API. let me know if you want concrete example code. -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021148.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cacti monitoring of Solr and Tomcat
Hi Andy, My favourite topic ;) See my sig below for SPM for Solr. At my last company we used Cacti but it felt very 1990s almost. Some ppl use zabbix, some graphite, some newrelic, some SPM, some nothing! Otis -- Solr Performance Monitoring - http://sematext.com/spm On Nov 19, 2012 2:18 PM, Andy Lester a...@petdance.com wrote: Is anyone using Cacti to track trends over time in Solr and Tomcat metrics? We have Nagios set up for alerts, but want to track trends over time. I've found a couple of examples online, but none have worked completely for me. I'm looking at this one next: http://forums.cacti.net/viewtopic.php?f=12t=19744start=15 It looks promising although it doesn't monitor Solr itself. Suggestions? Thanks, Andy -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
RE: How do I best detect when my DIH load is done?
James, was it you (cannot remember) that replied to one of my queries on this subject and mentioned that there was consideration being given to cleaning up the response codes to remove ambiguity? -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html Sent from the Solr - User mailing list archive at Nabble.com.
Inserting many documents and update relations
Hi there, i have a principal question. We have arround 5 million lucene documents. At the beginning we have arround 4000 XML-files which we transform to SolrInputDocuemnts by using solrj and adding them to the index. A document is also related to other documents, so while adding a document we have to do some queries (at least one) to identiy if there are related documents already in the cache in order to do the association to the related document. The related document also has a backlink, so we have to update also the related document (means load, update, delete and re-add). We are using solr 3.6.1. The performance is quite slow because of this queries and modfifications of already existing documents in the cache. Are there some configuration issues what we can do, or anything else? Thanks a lot in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Inserting-many-documents-and-update-relations-tp4021151.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How do I best detect when my DIH load is done?
I'm not sure. But there are at least a few jira issues open with differing ideas on how to improve this. For instance, SOLR-1554 SOLR-2728 SOLR-2729 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Monday, November 19, 2012 1:52 PM To: solr-user@lucene.apache.org Subject: RE: How do I best detect when my DIH load is done? James, was it you (cannot remember) that replied to one of my queries on this subject and mentioned that there was consideration being given to cleaning up the response codes to remove ambiguity? -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can Solr v1.4 and v4.0 co-exist in Tomcat?
Hi Ken- We've been running 1.3 and 4.0 as separate web apps within the same Tomcat instance for the last 3 weeks with no issues. The only challenge for us was refactoring our app client code to use SolrJ 4.0 to access both the the 1.3 and 4.0 backends. The calls to the 1.3 backend use the XML response format while the 4.0 backend use the Java binary format. -James On Nov 19, 2012, at 11:40 AM, kfdroid kfdr...@gmail.com wrote: I have an existing v1.4 implementation of Solr that supports 2 lines of business. For a third line of business the need to do Geo searching requires using Solr 4.0. I'd like to minimize the impact to the existing lines of business (let them upgrade at their own pace), however I want to share hardware if possible. Can I have Solr 4.0 and Solr 1.4 co-exist in the same Tomcat instance? If so, are there any potential side-effects to the existing Solr implementation I should be aware of? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Can-Solr-v1-4-and-v4-0-co-exist-in-Tomcat-tp4021146.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Per user document exclusions
Hi Christian, Since customization is not a problem in your case, how about writing out the userId and excluded document ids to the database when it is excluded, and then for each query from the user (possibly identified by a userid parameter), lookup the database by userid, construct a NOT filter out of the excluded docIds, then send to Solr as the fq? We are using a variant of this approach to allow database style wildcard search on document titles. -sujit On Nov 18, 2012, at 9:05 PM, Christian Jensen wrote: Hi, We have a need to allow each user to 'exclude' individual documents in the results. We can easily do this now within the RDBMS using a FTS index and a query with 'OUTER LEFT JOIN WHERE NULL' type of thing. Can Solr do this somehow? Heavy customization is not a problem - I would bet this has already been done. I would like to avoid multiple trips back and forth from either the DB or SOLR if possible. Thanks! Christian -- *Christian Jensen* 724 Ioco Rd Port Moody, BC V3H 2W8 +1 (778) 996-4283 christ...@jensenbox.com
Re: Solr4.0 / SolrCloud queries
Hi all , I have managed to successfully index around 6 million documents, but while indexing (and even now after the indexing has stopped), I am running into a bunch of errors. The most common error I see is / null:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://ABC:8983/solr/xyzabc/ I have made sure that the servers are able to communicate with each other using the same names. Another error I keep getting is that the leader stops recovering and goes red / recovery failed. /Error while trying to recover. core=ABC123:org.apache.solr.common.SolrException: We are not the leader/ The servers intermittently go offline taking down one of the shards and in turn stopping all search queries. The configuration I have Shard1: Server1 - Memory - 22GB , JVM - 8gb Server2 - Memory - 22GB , JVM - 10gb (This one is on recovery failed status, but still acting as a leader). Shard2: Server1 - Memory - 22GB , JVM - 8 GB (This one is on recovery failed status, but still acting as a leader). Server2 - Memory - 22 GB, JVM - 8 GB Shard3 Server1 - Memory - 22 GB, JVM - 10 GB Server2 - Memory - 22 GB, JVM - 8 GB While typing his post I did a Reload from the Core Admin page, and both servers (Shard1-Server2 and Shard2-Server1)came back up again. Has anyone else encountered these issues? Any steps to prevent these? Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4021154.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Per user document exclusions
Hi Christian, Since you didn't explicitly mention it, I'm not sure if you are aware of it - ManifoldCF has ACL support built in. This may be what you are after. Otis -- Solr Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 12:05 AM, Christian Jensen christ...@jensenbox.comwrote: Hi, We have a need to allow each user to 'exclude' individual documents in the results. We can easily do this now within the RDBMS using a FTS index and a query with 'OUTER LEFT JOIN WHERE NULL' type of thing. Can Solr do this somehow? Heavy customization is not a problem - I would bet this has already been done. I would like to avoid multiple trips back and forth from either the DB or SOLR if possible. Thanks! Christian -- *Christian Jensen* 724 Ioco Rd Port Moody, BC V3H 2W8 +1 (778) 996-4283 christ...@jensenbox.com
Re: Best way to retrieve 20 specific documents
If you are in Solr 4 you could use realtime get and list the ids that you need. For example: http://host:port/solr/mycore/get?ids=my_id_1,my_id_2... See http://lucidworks.lucidimagination.com/display/solr/RealTime+Get Tomás On Mon, Nov 19, 2012 at 5:27 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, How about id1 OR id2 OR id3? :) Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 2:40 PM, Dotan Cohen dotanco...@gmail.com wrote: Suppose that an application needs to retrieve about 20-30 solr documents by id. The application could simply run 20 queries to retrieve them, but is there a better way? The id field is stored and indexed, of course. It is of type solr.StrField, and is configured as the uniqueKey. Thank you for any insight. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Cacti monitoring of Solr and Tomcat
On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: My favourite topic ;) See my sig below for SPM for Solr. At my last company we used Cacti but it felt very 1990s almost. Some ppl use zabbix, some graphite, some newrelic, some SPM, some nothing! SPM looks mighty tasty, but we must have it in-house on our own servers, for monitoring internal dev systems, and we'd like it to be open source. We already have Cacti up and running, but it's possible we could use something else. -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Solr Delta Import Handler not working
| dataSource=null I think this should not be here. The datasource should default to the dataSource listing. And 'rootEntity=true' should be in the XPathEntityProcessor block, because you are adding each file as one document. - Original Message - | From: Spadez james_will...@hotmail.com | To: solr-user@lucene.apache.org | Sent: Sunday, November 18, 2012 7:34:34 AM | Subject: Re: Solr Delta Import Handler not working | | Update! Thank you to Lance for the help. Based on your suggestion I | have | fixed up a few things. | | *My Dataconfig now has the filename pattern fixed and root | entity=true* | /dataConfig | dataSource type=FileDataSource / | document | entity | name=document | processor=FileListEntityProcessor | baseDir=/var/lib/employ | fileName=^.*\.xml$ | recursive=false | rootEntity=true | dataSource=null | entity | processor=XPathEntityProcessor | url=${document.fileAbsolutePath} | useSolrAddSchema=true | stream=true | /entity | /entity | /document | /dataConfig/ | | *My data.xml has a corrected date format with T:* | /add | doc | field name=id123/field | field name=titleDelta Import 2/field | field name=descriptionThis is my long description/field | field name=truncated_descriptionThis is/field | | field name=companyGoogle/field | field name=location_nameEngland/field | field name=date2007-12-31T22:29:59/field | field name=sourceGoogle/field | field name=urlwww.google.com/field | field name=latlng45.17614,45.17614/field | /doc | /add/ | | | | -- | View this message in context: | http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897p4020925.html | Sent from the Solr - User mailing list archive at Nabble.com. |
Re: Cacti monitoring of Solr and Tomcat
We (Chegg) are using New Relic, even for the dev systems. It is pretty good, but only reports averages, when we need median and 90th percentile. Our next step is putting something together with the Metrics server from Coda Hale (http://metrics.codahale.com/) and Graphite (http://graphite.wikidot.com/). This looks far more capable than New Relic, but more work. wunder On Nov 19, 2012, at 12:36 PM, Andy Lester wrote: On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: My favourite topic ;) See my sig below for SPM for Solr. At my last company we used Cacti but it felt very 1990s almost. Some ppl use zabbix, some graphite, some newrelic, some SPM, some nothing! SPM looks mighty tasty, but we must have it in-house on our own servers, for monitoring internal dev systems, and we'd like it to be open source. We already have Cacti up and running, but it's possible we could use something else. -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Odd behaviour for case insensitive searches
Hello Everyone, I've been having issues with odd SOLR behavior when searching for case insensitive data. Let's take a vanilla SOLR config (from the example). Then I uploaded the default solr.xml document with a slight modification to the field with name 'name'. I added Thomas NOSQL. add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server Thomas NOSQL/field /doc /add Then when I search for nosql~ I got the record returned in the search However, when I seach for NOSQL~ no records are returned. You can see my solr admin interface here: http://skatingboutique.com [PORT 8080] /solr/#/tracks Why is this? -- View this message in context: http://lucene.472066.n3.nabble.com/Odd-behaviour-for-case-insensitive-searches-tp4021171.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr cloud shards and servers issue
Joe, Can you remove it from the config and have it gone when you restart Solr? Or restart Solr and unload as described on http://wiki.apache.org/solr/CoreAdmin ? Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 11:57 AM, joe.cohe...@gmail.com joe.cohe...@gmail.com wrote: How can I unload a solrCore after i killed the running process? Mark Miller-3 wrote On Nov 19, 2012, at 11:24 AM, joe.cohen.m@ wrote: Hi I have the following scenario: I have 1 collection across 10 servers. Num of shards: 10. Each server has 2 solr instances running. replication is 2. I want to move one of the instances to another server. meaning, kill the solr process in server X and start a new solr process in server Y instead. When I kill the solr process in server X, I can still see that instance in the solr-cloud-graph (marked differently). When I run the instance on server Y, it get attahced to another shard, instead of getting into the shard that is now actually missing an instance. 1. Any way to tell solr/zookeeper - Forget about that instance? Unload the SolrCores involved. 2. when running a new solr instance - any way to tell solr/zookeper - add this instance to shard X? Specify a shardId when creating the core or configuring it in solr.xml and make it match the shard you want to add to. - Mark -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101p402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CloudSolrServer or load-balancer for indexing
OK, got it. Thanks. On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote: Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there. CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup). - Mark On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
Re: CloudSolrServer or load-balancer for indexing
A single zookeeper node could be a single point of failure. It is recommended that you have at least one three zookeeper nodes running as an ensemble. Zookeeper has a simple rule - over half of your nodes must be available to achieve quorum and thus be functioning. This is to avoid 'split-brain'. Thus, with three servers, you could handle the loss of one zookeeper node. Five would allow the loss of two nodes. More to the point, you're pushing the static configuration from being a list of solr nodes, to being a list of Zookeeper nodes. The expectation is clearly that you'll need to scale your Zookeeper nodes far less often than you'd need to do it with Solr. Upayavira On Mon, Nov 19, 2012, at 09:39 PM, Marcin Rzewucki wrote: OK, got it. Thanks. On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote: Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there. CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup). - Mark On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
Re: Cacti monitoring of Solr and Tomcat
: Is anyone using Cacti to track trends over time in Solr and Tomcat : metrics? We have Nagios set up for alerts, but want to track trends : over time. A key thing to remember is that all of the stats you can get from solr via HTTP are also available via JMX... http://wiki.apache.org/solr/SolrJmx ...so anytime if you have favotire monitoring tool WizWat and you're wondering if anyone has tips on using WizWat to monitor Solr, start by checking if WizWat has any docs on monitoring apps using JMX. -Hoss
Re: solr cloud shards and servers issue
Maybe it would be better if Solr checked the live nodes and not all the existing nodes in zk. If a server dies and you need to start a new one, it would go straight to the correct shard without one needing to specify it manually. Of course, the problem could be if a server goes down for a minute and then comes back up, maybe a new node was added to the shard in the interim, but I still think it would be better this way. Tomás On Mon, Nov 19, 2012 at 1:51 PM, Mark Miller markrmil...@gmail.com wrote: On Nov 19, 2012, at 11:24 AM, joe.cohe...@gmail.com wrote: Hi I have the following scenario: I have 1 collection across 10 servers. Num of shards: 10. Each server has 2 solr instances running. replication is 2. I want to move one of the instances to another server. meaning, kill the solr process in server X and start a new solr process in server Y instead. When I kill the solr process in server X, I can still see that instance in the solr-cloud-graph (marked differently). When I run the instance on server Y, it get attahced to another shard, instead of getting into the shard that is now actually missing an instance. 1. Any way to tell solr/zookeeper - Forget about that instance? Unload the SolrCores involved. 2. when running a new solr instance - any way to tell solr/zookeper - add this instance to shard X? Specify a shardId when creating the core or configuring it in solr.xml and make it match the shard you want to add to. - Mark
Re: Order by hl.snippets count
(12/11/20 1:50), Gabriel Croitoru wrote: Hello, I'm using Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters options. The client just asked us to change the order from the default score to the number of hl.snippets per document. It's this posibble from Solr configuration? (without implementing a custom scoring algorithm)? I don't think it is possible. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Best way to retrieve 20 specific documents
On 11/19/2012 1:49 PM, Dotan Cohen wrote: On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, How about id1 OR id2 OR id3? :) Thank, Otis. This was my first inclination (id:123 OR 456), but it didn't work when I tried. At your instigation I tried then id:123 OR id:456. This does work. Thanks. You can also use this query format: id:(123 OR 456 OR 789) This does get expanded internally by the query parser to the format that has the field name on every clause, but it is sometimes easier to write code that produces the above form. Thanks, Shawn
Re: Execute an independent query from the main query
Hi Otis, Yes, that seems like one solution, however I have multiple opening and closing hours, within the same day. Therefore it might become somewhat complicated to manage the index. For now I shifted the business logic to the client and a second query is made to get the additional data. Thanks for the suggestion. Indika On 20 November 2012 02:50, Otis Gospodnetic otis.gospodne...@gmail.comwrote: Hi Indika, So my suggestion was to maybe consider changing the index structure and pull open/close times into 1 or more fields in the main record, so you don't have this problem all together. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Sun, Nov 18, 2012 at 10:39 PM, Indika Tantrigoda indik...@gmail.com wrote: Hi Otis, Actually I maintain a separate document for each open/close time along with the date (i.e. Sunday =1, Monday =2). I was thinking if it would be possible to query Solr asking, give the next day's (can be current_day +1) minimum opening time as a response field. Thanks, Indika On 19 November 2012 04:50, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe your index needs to have a separate field for each day open/close time. No join or extra query needed then. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 18, 2012 5:35 PM, Indika Tantrigoda indik...@gmail.com wrote: Thanks for the response. Erick, My use case is related to restaurant opening hours, In the same request to Solr I'd like to get the time when the restaurant opens the next day, preferably part of the fields returned, and this needs to be independent of the main queries search params. Yes, the Join wouldn't be suitable in this use case. Luis, I had thought of having the logic in the client side, but before that I wanted to see if I could get the result from Solr itself. I am currently using SolrJ along with Spring. Thanks, Indika On 18 November 2012 21:49, Luis Cappa Banda luisca...@gmail.com wrote: Hello! When queries become more and more complex and you need to apply one second query with the resultant docs from the first one, or re-sort results, or maybe add some promotional or special docs to the response, I recommend to develop a Web App module that implements that complex business logic and dispatches queries from your Client App to your Solr back-end. That module, let's call Search Engine, lets you play with all those special use cases. If you are familiar with Java I suggest you to have a look at the combination between SolrJ and Spring framework or Jersey. Regards, - Luis Cappa. El 18/11/2012 15:15, Indika Tantrigoda indik...@gmail.com escribió: Hi All, I would like to get results of an query that is different from the main query as a new field. This query needs to be independent from any filter queries applied to the main query. I was trying to achieve this by fl=_external_query_result:query($myQuery), however that result seems to be governed by any filter queries applied to the main query ? Is it possible to have a completely separate query in the fl list and return its result along with the results (per results), or would I need to create a separate query on the client side to get the results of the independent query (based on the results from the first query) ? Thanks in advance, Indika
Re: Preventing accepting queries while custom QueryComponent starts up?
: I have several custom QueryComponents that have high one-time startup costs : (hashing things in the index, caching things from a RDBMS, etc...) you need to provide more details about how your custom components work -- in particular: where in teh lifecycle of your components is this high-startup cost happening? : Is there a way to prevent solr from accepting connections before all : QueryComponents are ready? Define ready ? ... things that happen in the init() and inform(SolrCore) methods will completley prevent the SolrCore from being available for queries. Likewise: if you are using firstSearcher warming queries, then the useColdSearcher option in solrconfig.xml can be used to control wether or not external requests will block until the searcher is available or not -- however this doesn't prevent the servlet container from accepting the HTTP connection. but as mentioned, this is where things like the PingRequestHandler and the enable/disable commands can be used to take servers in and out of rotation with your load balancer -- assuming that your load balanver can be configured to monitor the ping URL. Alternatively you can just use native features of your load balancer to control this independent of solr (but the ping handler is a nice way of letting one set of dev/ops folks own the solr servers and control their availability even if they don't have the ability to control the load blaancer itself) -Hoss
Re: Best way to retrieve 20 specific documents
In fact, you shouldn't need OR: id:(123 456 789) will default to OR. Upayavira On Mon, Nov 19, 2012, at 10:45 PM, Shawn Heisey wrote: On 11/19/2012 1:49 PM, Dotan Cohen wrote: On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, How about id1 OR id2 OR id3? :) Thank, Otis. This was my first inclination (id:123 OR 456), but it didn't work when I tried. At your instigation I tried then id:123 OR id:456. This does work. Thanks. You can also use this query format: id:(123 OR 456 OR 789) This does get expanded internally by the query parser to the format that has the field name on every clause, but it is sometimes easier to write code that produces the above form. Thanks, Shawn
All-wildcard query performance
Hi, Our application sometimes generates queries with one of the constraints: field:[* TO *] I expected this query performance to be the same as if we omitted the field constraint completely. However, I see the performance of the two queries to differ drastically (3ms without all-wildcard constraint, 200ms with it). Could someone explain the source of the difference, please? I am fixing the application not to generate such queries, obviously, but still would like to understand the logic here. We use Solr 3.6.1. Thanks. -- Aleksey
Re: SolrCloud Error after leader restarts
It's generally not a good choice to use ram directory. 4x solrcloud does not work with it no - 5x does, but in any case, ram dir is not persistent. So when you restart Solr you will lose the data. MMap is generally the right dir to use. - Mark On Nov 19, 2012, at 6:52 PM, deniz denizdurmu...@gmail.com wrote: yea, i am using ram. solrcloud is not working with ram directory? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021194.html Sent from the Solr - User mailing list archive at Nabble.com.
solr4 MULTIPOLYGON search syntax
Does anybody have any info on how to property construct a multipolygon search? Im very interested in Polygon (search all documents within a shape) Multipolygon (search all documents within 2+ shapes) Multipolygon (search all documents with 2+ shapes but not within an area within a shape - if you can image a donut where you dont search within the hole in the center) Im trying to search 2 shapes but get errors at the moment. Polygon searches work just fine so I have everything installed correctly, but 2 shapes in the one search as per below is not working. I cant find anything on the net to try and debug Multipolygons. My multipolygon query looks like this. fq=geo:Intersects(MULTIPOLYGON ((149.4023 -34.6072, 149.4023 -34.8690, 149.9022 -34.8690, 149.9022 -34.6072, 149.4023 -34.6072)), ((151.506958 -33.458943, 150.551147 -33.60547, 151.00708 -34.257216, 151.627808 -33.861293, 151.506958 -33.458943))) And I get this error. ERROR 500 error reading WKT But a polygon search works fine. fq=geo:Intersects(POLYGON((149.4023 -34.6072, 149.4023 -34.8690, 149.9022 -34.8690, 149.9022 -34.6072, 149.4023 -34.6072))) -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-MULTIPOLYGON-search-syntax-tp4021199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: All-wildcard query performance
Hi, Our application sometimes generates queries with one of the constraints: field:[* TO *] I expected this query performance to be the same as if we omitted the field constraint completely. However, I see the performance of the two queries to differ drastically (3ms without all-wildcard constraint, 200ms with it). Could someone explain the source of the difference, please? I am fixing the application not to generate such queries, obviously, but still would like to understand the logic here. We use Solr 3.6.1. Thanks. That query does not mean all docs. It means something slightly different - all documents for which field is present. If this field happens to exist in every document, then it amounts to the same thing, but Solr still must check every document. Thanks, Shawn
Re: More Like this without a document?
: If I want to use MoreLikeThis algorithm I need to add this documents in the : index? The MoreLikeThis will work with soft commits? Is there a solution to : do a MoreLikeThis without adding the document in the index? you can feed the MoreLikeThisHandler a ContentStream (ie: POST data, or file upload, or stream.body request param) of text instead of sending it a query and it will use that raw text to find more like this http://wiki.apache.org/solr/MoreLikeThisHandler -Hoss
Re: SolrCloud Error after leader restarts
i know facts about ramdirectory actually.. just running some perf tests on our dev env right now.. so in case i use ramdir with 5x cloud, it will still not do the recovery? i mean it will not get the data from the leader and fill its ramdir again? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is it possible to save the search query?
Hi, Thanks for your guidance. I am unable to figure out what is a doc ID and how can i collect all the doc IDs. Thanks and regards, Romita Saha From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org, Date: 11/09/2012 12:33 AM Subject:Re: is it possible to save the search query? Hi, Aha, I think I understand. Yes, you could collect all doc IDs from each query and find the differences. There is nothing in Solr that can find those differences or that would store doc IDs of returned hits in the first place, so you would have to implement this yourself. Sematext's Search Analytics service my be of help here in the sense that all data you need (queries, doc IDs, etc.) are collected, so it would be a matter of providing an API to get the data for off-line analysis. But this data collection+diffing is also something you could implement yourself. One thing to think about - what do you do when a query returns a lrge number of hits. Do you really want/need to get IDs for all of them, or only a page at a time. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, The following is the example; 1st query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data ^2 idstart=0rows=11fl=data,id Next query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data id^2start=0rows=11fl=data,id In the 1st query the the field 'data' is boosted by 2. However may be the user was not satisfied with the response. Thus in the next query he boosted the field 'id' by 2. I want to record both the queries and compare between the two, meaning, what are the changes implemented on the 2nd query which are not present in the previous one. Thanks and regards, Romita Saha From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org, Date: 11/08/2012 01:35 PM Subject:Re: is it possible to save the search query? Hi, Compare in what sense? An example will help. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com wrote: Hi All, Is it possible to record a search query in solr and then compare it with the previous search query? Thanks and regards, Romita Saha
Re: SolrCloud Error after leader restarts
On Nov 19, 2012, at 9:11 PM, deniz denizdurmu...@gmail.com wrote: so in case i use ramdir with 5x cloud, it will still not do the recovery? i mean it will not get the data from the leader and fill its ramdir again? Yes, in 5x RAM directory should be able to recover. - Mark
Ranking by sorting score and rankingField better or by product(score, rankingField)?
Hi there, I have a field(which is externalFileField, called rankingField) and that value(type=float) is calculated by client app. For the solr original scoring model, affect boost value will result different ranking. So I think product(score,rankingField) may equivalent to solr scoring model. What I curious is which will be better in practice and the different meanings on these three solutions? 1. sort=score+desc,ranking+desc 2. sort=ranking+desc,score+desc 3. sort=product(score,ranking) --is this possible? I'd like to hear your thoughts. Many thanks Floyd
Re: SolrCloud Error after leader restarts
Mark Miller-3 wrote On Nov 19, 2012, at 9:11 PM, deniz lt; denizdurmus87@ gt; wrote: so in case i use ramdir with 5x cloud, it will still not do the recovery? i mean it will not get the data from the leader and fill its ramdir again? Yes, in 5x RAM directory should be able to recover. - Mark thank you so much for your patience with me :) - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021209.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?
Hi, 3. yes, you can sort by function - http://search-lucene.com/?q=solr+sort+by+function 2. this will sort by score only when there is a tie in ranking (two docs have the same rank value) 1. the reverse of 2. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, I have a field(which is externalFileField, called rankingField) and that value(type=float) is calculated by client app. For the solr original scoring model, affect boost value will result different ranking. So I think product(score,rankingField) may equivalent to solr scoring model. What I curious is which will be better in practice and the different meanings on these three solutions? 1. sort=score+desc,ranking+desc 2. sort=ranking+desc,score+desc 3. sort=product(score,ranking) --is this possible? I'd like to hear your thoughts. Many thanks Floyd
Re: Custom ranking solutions?
Hi Floyd, Use debugQuery=true and let's see it.:) Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, Before ExternalFielField introduced, change document boost value to achieve custom ranking. My client app will update each boost value for documents daily and seem to worked fine. Actual ranking could be predicted based on boost value. (value is calculated based on click, recency, and rating ). I'm now try to use ExternalFileField to do some ranking, after some test, I did not get my expectation. I'm doing a sort like this sort=product(score,abs(rankingField))+desc But the query result ranking won't change anyway. The external file as following doc1=3 doc2=5 doc3=9 The original score get from Solr result as fllowing doc1=41.042 doc2=10.1256 doc3=8.2135 Expected ranking doc1 doc3 doc2 What wrong in my test, please kindly help on this. Floyd
Re: is it possible to save the search query?
Hi, Document ID would be a field in your document. A unique field that you specify when indexing. You can collect it by telling Solr to return it in the search results by including it in the fl= parameter. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:31 PM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, Thanks for your guidance. I am unable to figure out what is a doc ID and how can i collect all the doc IDs. Thanks and regards, Romita Saha From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org, Date: 11/09/2012 12:33 AM Subject:Re: is it possible to save the search query? Hi, Aha, I think I understand. Yes, you could collect all doc IDs from each query and find the differences. There is nothing in Solr that can find those differences or that would store doc IDs of returned hits in the first place, so you would have to implement this yourself. Sematext's Search Analytics service my be of help here in the sense that all data you need (queries, doc IDs, etc.) are collected, so it would be a matter of providing an API to get the data for off-line analysis. But this data collection+diffing is also something you could implement yourself. One thing to think about - what do you do when a query returns a lrge number of hits. Do you really want/need to get IDs for all of them, or only a page at a time. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, The following is the example; 1st query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data ^2 idstart=0rows=11fl=data,id Next query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data id^2start=0rows=11fl=data,id In the 1st query the the field 'data' is boosted by 2. However may be the user was not satisfied with the response. Thus in the next query he boosted the field 'id' by 2. I want to record both the queries and compare between the two, meaning, what are the changes implemented on the 2nd query which are not present in the previous one. Thanks and regards, Romita Saha From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org, Date: 11/08/2012 01:35 PM Subject:Re: is it possible to save the search query? Hi, Compare in what sense? An example will help. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com wrote: Hi All, Is it possible to record a search query in solr and then compare it with the previous search query? Thanks and regards, Romita Saha
Re: Best way to retrieve 20 specific documents
I wanted to be explicit for the OP. Vut wouldn't that depend on mm if you are using (e)dismax? Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 6:37 PM, Upayavira u...@odoko.co.uk wrote: In fact, you shouldn't need OR: id:(123 456 789) will default to OR. Upayavira On Mon, Nov 19, 2012, at 10:45 PM, Shawn Heisey wrote: On 11/19/2012 1:49 PM, Dotan Cohen wrote: On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, How about id1 OR id2 OR id3? :) Thank, Otis. This was my first inclination (id:123 OR 456), but it didn't work when I tried. At your instigation I tried then id:123 OR id:456. This does work. Thanks. You can also use this query format: id:(123 OR 456 OR 789) This does get expanded internally by the query parser to the format that has the field name on every clause, but it is sometimes easier to write code that produces the above form. Thanks, Shawn
Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?
Thanks Otis, But the sort=product(score, rankingField) is not working in my test. What probably wrong? Floyd 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com Hi, 3. yes, you can sort by function - http://search-lucene.com/?q=solr+sort+by+function 2. this will sort by score only when there is a tie in ranking (two docs have the same rank value) 1. the reverse of 2. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, I have a field(which is externalFileField, called rankingField) and that value(type=float) is calculated by client app. For the solr original scoring model, affect boost value will result different ranking. So I think product(score,rankingField) may equivalent to solr scoring model. What I curious is which will be better in practice and the different meanings on these three solutions? 1. sort=score+desc,ranking+desc 2. sort=ranking+desc,score+desc 3. sort=product(score,ranking) --is this possible? I'd like to hear your thoughts. Many thanks Floyd
Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?
Hi, Do you see any errors? Which version of Solr? What does debugQuery=true say? Are you sure your file with ranks is being used? (remove it, put some junk in it, see if that gives an error) Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu floyd...@gmail.com wrote: Thanks Otis, But the sort=product(score, rankingField) is not working in my test. What probably wrong? Floyd 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com Hi, 3. yes, you can sort by function - http://search-lucene.com/?q=solr+sort+by+function 2. this will sort by score only when there is a tie in ranking (two docs have the same rank value) 1. the reverse of 2. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, I have a field(which is externalFileField, called rankingField) and that value(type=float) is calculated by client app. For the solr original scoring model, affect boost value will result different ranking. So I think product(score,rankingField) may equivalent to solr scoring model. What I curious is which will be better in practice and the different meanings on these three solutions? 1. sort=score+desc,ranking+desc 2. sort=ranking+desc,score+desc 3. sort=product(score,ranking) --is this possible? I'd like to hear your thoughts. Many thanks Floyd
Re: Custom ranking solutions?
HI Otis, The debug information as following, seems there is no product() process . lst name=debug str name=rawquerystring_l_all:測試/str str name=querystring_l_all:測試/str str name=parsedqueryPhraseQuery(_l_all:測 試)/str str name=parsedquery_toString_l_all:測 試/str lst name=explain str name=222 41.11747 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result of: 41.11747 = fieldWeight in 0, product of: 4.1231055 = tf(freq=17.0), with freq of: 17.0 = phraseFreq=17.0 1.4246359 = idf(), sum of: 0.71231794 = idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 7.0 = fieldNorm(doc=0) /str str name=223 14.246359 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result of: 14.246359 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 1.4246359 = idf(), sum of: 0.71231794 = idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 10.0 = fieldNorm(doc=0) /str str name=211 10.073696 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result of: 10.073696 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = phraseFreq=2.0 1.4246359 = idf(), sum of: 0.71231794 = idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 5.0 = fieldNorm(doc=0) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time6.0/double lst name=prepare double name=time0.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time6.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time3.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time3.0/double /lst /lst /lst /lst 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com Hi Floyd, Use debugQuery=true and let's see it.:) Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, Before ExternalFielField introduced, change document boost value to achieve custom ranking. My client app will update each boost value for documents daily and seem to worked fine. Actual ranking could be predicted based on boost value. (value is calculated based on click, recency, and rating ). I'm now try to use ExternalFileField to do some ranking, after some test, I did not get my expectation. I'm doing a sort like this sort=product(score,abs(rankingField))+desc But the query result ranking won't change anyway. The external file as following doc1=3 doc2=5 doc3=9 The original score get from Solr result as fllowing doc1=41.042 doc2=10.1256 doc3=8.2135 Expected ranking doc1 doc3 doc2 What wrong in my test, please kindly help on this. Floyd
Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?
Hi Otis, There is no error in console nor in log file. I'm using Solr-4.0. The External file name is external_rankingField.txt and exist is directory C:\solr-4.0.0\example\solr\collection1\data\external_rankingField.txt External file should work as well because when I issue query sort=sqrt(rankingField)+desc or sort=sqrt(rankingField)+asc or sort=sqrt(rankingField)+desc Things will change accordingly. By the way, I first try external field according document here http://lucidworks.lucidimagination.com/display/solr/Working+with+External+Files+and+Processes Format of the External File The file itself is located in Solr's index directory, which by default is $SOLR_HOME/data/index. The name of the file should beexternal_*fieldname* or external_*fieldname*.*. For the example above, then, the file could be named external_entryRankFile orexternal_entryRankFile.txt. But actually the external file should put in $SOLR_HOME/data/ Floyd 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com Hi, Do you see any errors? Which version of Solr? What does debugQuery=true say? Are you sure your file with ranks is being used? (remove it, put some junk in it, see if that gives an error) Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu floyd...@gmail.com wrote: Thanks Otis, But the sort=product(score, rankingField) is not working in my test. What probably wrong? Floyd 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com Hi, 3. yes, you can sort by function - http://search-lucene.com/?q=solr+sort+by+function 2. this will sort by score only when there is a tie in ranking (two docs have the same rank value) 1. the reverse of 2. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, I have a field(which is externalFileField, called rankingField) and that value(type=float) is calculated by client app. For the solr original scoring model, affect boost value will result different ranking. So I think product(score,rankingField) may equivalent to solr scoring model. What I curious is which will be better in practice and the different meanings on these three solutions? 1. sort=score+desc,ranking+desc 2. sort=ranking+desc,score+desc 3. sort=product(score,ranking) --is this possible? I'd like to hear your thoughts. Many thanks Floyd
Weird Behaviour on Solr 5x (SolrCloud)
Hi all, after Mark Miller made it clear to me that 5x is supporting cloud with ramdir, I have started playing with it and it seemed working smoothly, except a weird behaviour.. here is the story of it: Basically, I have pulled the code and built solr 5x, and the replace the war file in webapps dir of my current installation... then i have started my zookeeper servers.. after that i have started solr instances with the params below: java -Djetty.port=7574 -DzkHost=zkserver2:2182 -jar start.jar (running on a remote machine) java -Dbootstrap_conf=true -DzkHost=zkserver1:2181 -jar start.jar (running on local) after both of them are up, i have indexed some docs, and both of the solr instances were updated succesfully. after this point, i have killed one of the solr (running on remote, not leader) and then restarted it again. there was no errors in the log and everything seemed normal in the logs... however, when i have checked the web interface for the one i have restarted it showed 0 docs.. after that I ran q=*:* few times... and thats the point which surprises me... randomly it returned 0 results and then it returned correct numbers.. each time i make the same query, i get an empty result set randomly... I have no idea why this is happening here is the logs for the one running on remote (which was restarted) Nov 20, 2012 11:32:11 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382331589start=0q=*:*isShard=truefsv=true} hits=0 status=0 QTime=0 Nov 20, 2012 11:32:11 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0 status=0 QTime=7 Nov 20, 2012 11:32:22 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382342238start=0q=*:*isShard=truefsv=true} hits=0 status=0 QTime=0 Nov 20, 2012 11:32:22 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0 status=0 QTime=7 Nov 20, 2012 11:32:27 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382347438start=0q=*:*isShard=truefsv=true} hits=0 status=0 QTime=0 Nov 20, 2012 11:32:27 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0 status=0 QTime=14 Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382348255start=0q=*:*isShard=truefsv=true} hits=0 status=0 QTime=1 Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0 status=0 QTime=7 Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=32 status=0 QTime=14 and for the same query, here is the log, from my local (leader, not restarted) Nov 20, 2012 11:31:46 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382306472start=0q=*:*isShard=truefsv=true} hits=32 status=0 QTime=0 Nov 20, 2012 11:31:46 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={df=textshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382306472q=*:*ids=SP2514N,GB18030TEST,apple,F8V7067-APL-KIT,adata,6H500F0,MA147LL/A,ati,IW-02,asusdistrib=falseisShard=truewt=javabinrows=10version=2} status=0 QTime=1 Nov 20, 2012 11:32:00 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382320738start=0q=*:*isShard=truefsv=true} hits=32 status=0 QTime=0 Nov 20, 2012 11:32:00 AM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={df=textshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382320738q=*:*ids=SP2514N,GB18030TEST,apple,F8V7067-APL-KIT,adata,6H500F0,MA147LL/A,ati,IW-02,asusdistrib=falseisShard=truewt=javabinrows=10version=2} status=0 QTime=1 Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute INFO: [collection1]
Re: solr autocomplete requirement
Anyone with suggestions on this? On Mon, Nov 19, 2012 at 10:13 PM, Sujatha Arun suja.a...@gmail.com wrote: Hi, Our requirement for auto complete is slightly complicated , We need two types of auto complete 1. Meta data Auto complete 2. Full text Content Auto complete In addition the metadata fields are multi-valued we need to filter the results for certain auto-complete both types After trying different approaches like 1)Suggester -We cannot filter results 2)Terms Comp - We cannot filter 3)Facets on Full text Content with Tokenized fields - Expensive 4)Same core with n-gram Indexing and storing the results and using the highlight component to fetch the snippet for autosuggest. The last approach which we are leaning towards has 2 draw backs - One- it returns duplicates data as ,some meta data is the same across documents Two- words are getting truncated at character when results are returned with highlight Mitigation for the above 2 issue could be : Remove duplicates after obtaining results at Application (issue could be additional time for this) Use fast vector highlight that can help with full word snippets (could be heavy on the Index Size) Anybody body has any suggestion / had similar requirements with successful implementation? Other question ,what would be impact of serving the suggestions out of the same core as the one we are searching while using highlight component for fetching snippets. For our full text search requirements ,we are doing the highlight outside solr, in our application and we would be storing and using the highlight , only for suggestion. Thanks Sujatha
Re: Custom ranking solutions?
Hi Otis, I'm doing some test like this, http://localhost:8983/solr/select/?fl=score,_l_unique_keydefType=funcq=product(abs(rankingField),abs(score))http://localhost:8983/solr/select/?fl=score,_l_unique_keydefType=funcq=product(abs(ranking),abs(score)) and I get following response, lst name=error str name=msgcan not use FieldCache on unindexed field: score/str int name=code400/int /lst if change score to rankingField like this http://localhost:8983/solr/select/?fl=score,_l_unique_keydefType=funcq=product(abs(rankingField),abs(rankingField))http://localhost:8983/solr/select/?fl=score,_l_unique_keydefType=funcq=product(abs(ranking),abs(score)) result name=response numFound=3 start=0 maxScore=2500.0 doc str name=_l_unique_key211/str float name=score2500.0/float /doc doc str name=_l_unique_key223/str float name=score4.0/float /doc doc str name=_l_unique_key222/str float name=score0.01001/float /doc /result Seems like score could not put into function query? Floyd 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com Hi Floyd, Use debugQuery=true and let's see it.:) Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, Before ExternalFielField introduced, change document boost value to achieve custom ranking. My client app will update each boost value for documents daily and seem to worked fine. Actual ranking could be predicted based on boost value. (value is calculated based on click, recency, and rating ). I'm now try to use ExternalFileField to do some ranking, after some test, I did not get my expectation. I'm doing a sort like this sort=product(score,abs(rankingField))+desc But the query result ranking won't change anyway. The external file as following doc1=3 doc2=5 doc3=9 The original score get from Solr result as fllowing doc1=41.042 doc2=10.1256 doc3=8.2135 Expected ranking doc1 doc3 doc2 What wrong in my test, please kindly help on this. Floyd
configuring solr xml as a datasource
Hi, I am new to solr. I am trying to use solr xml data source for solr search engine. I have created test.xml file as - add doc field name=fnameleena1/field field name=number101/field /doc /add I have created data-config.xml file dataConfig dataSource type=FileDataSource encoding=UTF-8 / document entity name=page processor=XPathEntityProcessor stream=true forEach=/rootelement url=C:\solr\conf\test.xml transformer=RegexTransformer,DateFormatTransformer field column=namexpath=/rootelement/name / field column=number xpath=/rootelement/number / /entity /document /dataConfig And added below code in solrconfig.xml : requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configC:\solr\conf\data-config.xml/str /lst /requestHandler But when I go to this link http://localhost:8080/solr/dataimport?command=full-import Its showing Total Rows Fetched=0 , Total Documents Processed=0. How can I solve this problem? Please provide me the solution. Thanks Regards, Leena Jawale Software Engineer Trainee BFS BU Phone No. - 9762658130 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Timeout when calling Luke request handler after migrating from Solr 3.5 to 3.6.1
Hi all, As part of our business logic we query the Luke request handler to extract the fields in the index from our code using the following url: http://server:8080/solr/admin/luke?wt=jsonnumTerms=0 This worked fine with Solr 3.5, but now with 3.6.1 this call never returns, it hangs, and there is no error message in the server logs. Has any one seen this, or has an idea of what may be causing this? The Luke request handler is configured by default, we didn't change the configuration for this. If I go to solr/admin/stats.jsp, it is shown: name: /admin/luke class: org.apache.solr.handler.admin.LukeRequestHandler version: $Revision: 1242152 $ description: Lucene Index Browser. Inspired and modeled after Luke: http://www.getopt.org/luke/ stats: handlerStart : 1353373022984 requests : 0 errors : 0 timeouts : 0 totalTime : 0 avgTimePerRequest : NaN avgRequestsPerSecond : 0.0 We are running Apache Tomcat 6.0.35 with JDK 1.7.0_03, in case that rings a bell. The index has about Alternatively, our requirement is to get the list of fields in the index, including dynamic fields – is there any other way to obtain this at runtime? It is an application that runs on a separate process from Solr, and may even run on a separate box, thus the Luke call. Thank you for any help you can provide. Jose.