URLDecoder error message
Hey, yesterday we updated from solr 4.0 to solr 4.1 and since then from time to time following error pops up: {msg=URLDecoder: Invalid character encoding detected after position 160 of query string / form data (while parsing as UTF-8),code=400}: {msg=URLDecoder: Invalid character encoding detected after position 160 of query string / form data (while parsing as UTF-8),code=400} Is this an issue with incorrect input data, or is something broken with our solr? Solr 4.0 did not gave us these errors. Regards Ota -- View this message in context: http://lucene.472066.n3.nabble.com/URLDecoder-error-message-tp4039883.html Sent from the Solr - User mailing list archive at Nabble.com.
compare two shards.
hello. i want to compare two shards each other, because these shards should have the same index. but this isnt so =( so i want to find these documents, there are missing in one shard of my both shards. my ideas - distrubuted shard request on my nodes and fire a facet search on my unique-field. but the result of facet component isnt reversable =( - grouping. but its not working correctly i think so. no groups of the same uniquekey in the resultset. does anyone some better ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/compare-two-shards-tp4039887.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexed And Stored
hello, in my schema field name=city_name type=text_general indexed=false stored=true/ and i updated 18 data. now i need indexed=true for all old data. i need solution please someone help me out. please reply urgent!! thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed And Stored
On 12 February 2013 15:49, anurag.jain anurag.k...@gmail.com wrote: hello, in my schema field name=city_name type=text_general indexed=false stored=true/ and i updated 18 data. now i need indexed=true for all old data. i need solution [...] You have no choice but to change the schema, and reindex, either from the original source, or by first pulling from the existing stored values. Regards, Gora
Re: replication problems with solr4.1
Now this is strange, the index generation and index version is changing with replication. e.g. master has index generation 118 index version 136059533234 and slave has index generation 118 index version 136059533234 are both same. Now add one doc to master with commit. master has index generation 119 index version 1360595446556 Next replicate master to slave. The result is: master has index generation 119 index version 1360595446556 slave has index generation 120 index version 1360595564333 I have not seen this before. I thought replication is just taking over the index from master to slave, more like a sync? Am 11.02.2013 09:29, schrieb Bernd Fehling: Hi list, after upgrading from solr4.0 to solr4.1 and running it for two weeks now it turns out that replication has problems and unpredictable results. My installation is single index 41 mio. docs / 115 GB index size / 1 master / 3 slaves. - the master builds a new index from scratch once a week - a replication is started manually with Solr admin GUI What I see is one of these cases: - after a replication a new searcher is opened on index.xxx directory and the old data/index/ directory is never deleted and besides the file replication.properties there is also a file index.properties OR - the replication takes place everything looks fine but when opening the admin GUI the statistics report Last Modified: a day ago Num Docs: 42262349 Max Doc: 42262349 Deleted Docs: 0 Version: 45174 Segment Count: 1 VersionGen Size Master: 1360483635404 112 116.5 GB Slave:1360483806741 113 116.5 GB In the first case, why is the replication doing that??? It is an offline slave, no search activity, just there fore backup! In the second case, why is the version and generation different right after full replication? Any thoughts on this? - Bernd -- * Bernd FehlingBielefeld University Library Dipl.-Inform. (FH)LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Indexed And Stored
Hello! The simplest way will be updating your schema.xml file, do the change that needs to be done and fully re-index your data. Solr wont be able to automatically change not indexed field to indexed one. You could also use the partial document update API of Solr if you don't have your original data, however there are a few limitations. In order to fully reconstruct your documents you would have to have all the fields stored in your index. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch hello, in my schema field name=city_name type=text_general indexed=false stored=true/ and i updated 18 data. now i need indexed=true for all old data. i need solution please someone help me out. please reply urgent!! thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Maximum Number of Records In Index
Our document ID's are most definately distinct and there are partial updates to existing records, I have run SQL queries outside of SOLR to validate records going in and only about 1% are updates to existing records. There are no deletes underway every day new records are added or updated. Example for today. Before Data Handler ran, 13,586,537 records in SOLR all distinct ID's. Records extracted from 7 different sources to go into SOLR index was , 45,345, of these 1,912 were updates to existing records. Thus 43,433 were new records each with a new ID. I made sure ID's we always distinct. Yet our index now says 13,589,646. Indicating that only 3,109 new records went into the index. Thus missing 40,324 records. I use Date Facet Range and can see that there is an increase for January and February this year. In conclusion I have to say that it must be removing earlier records somehow despite no knowing where this may be controlled/set if at all. If there is a possible configuration to remove or weed records where is this configured? Our SOLR is virtually out of the box and only SOLCONFIG and SCHEMA amended to suit the needs of our business for fields and field types indexed. We also have configured the macro.s Velocity to display results. So none the wiser and thank you to all whom have responded so far. -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961p4039908.html Sent from the Solr - User mailing list archive at Nabble.com.
Tag facet.query excludes are broken when group.facet=true - SOLR 4.1 Bug?
I'm trying to use facets alongside grouping, however when I ask SOLR to compute grouped facet counts (group.facet=true, see http://wiki.apache.org/solr/FieldCollapsing) it no longer honours facet.query excludes, however without this (group.facet=false) the exclude works again without any problems. Have I mis-understood the purpose of group.facet or is there, as it appears to me, a bug in SOLR 4.1? Here is the simplest version of a query that shows the problem I'm having: http://localhost:8080/wmp/product/select ?q=title_search:history rows=0 fq={!tag=format}formatLegend:Paperback group=true group.field=titleCode group.limit=9 group.facet=true facet=true facet.query={!key=Paperback ex=format}formatLegend:Paperback facet.query={!key=Hardback ex=format}formatLegend:Hardback This produces: lst name=facet_queries int name=Paperback1492/int int name=Hardback0/int /lst However just switching group.facet=false, the following is produced, showing the exclude appears to have been ignored previously: lst name=facet_queries int name=Paperback1492/int int name=Hardback1361/int /lst Anybody else tried using this combination of excludes, facet queries and grouping and got this working? Kind Regards, Mark
Re: Maximum Number of Records In Index
A couple of things to check. 1) Have you retained your solr logs. If so, take a look in them for indexing errors. 2) What is the difference between maxdocs and numdocs. This will give an indication if a large number of records are being deleted or updated. 3) Can you explain your partial updates? Are you sending the entire document again for the partial update? Try to debug your next load. Perform the load, and watch the logs for errors. Write a program that loops through each doc and checks to see if the doc is present in the index. On Tue, Feb 12, 2013 at 6:19 AM, Macroman peter0...@hotmail.com wrote: Our document ID's are most definately distinct and there are partial updates to existing records, I have run SQL queries outside of SOLR to validate records going in and only about 1% are updates to existing records. There are no deletes underway every day new records are added or updated. Example for today. Before Data Handler ran, 13,586,537 records in SOLR all distinct ID's. Records extracted from 7 different sources to go into SOLR index was , 45,345, of these 1,912 were updates to existing records. Thus 43,433 were new records each with a new ID. I made sure ID's we always distinct. Yet our index now says 13,589,646. Indicating that only 3,109 new records went into the index. Thus missing 40,324 records. I use Date Facet Range and can see that there is an increase for January and February this year. In conclusion I have to say that it must be removing earlier records somehow despite no knowing where this may be controlled/set if at all. If there is a possible configuration to remove or weed records where is this configured? Our SOLR is virtually out of the box and only SOLCONFIG and SCHEMA amended to suit the needs of our business for fields and field types indexed. We also have configured the macro.s Velocity to display results. So none the wiser and thank you to all whom have responded so far. -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961p4039908.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
LoadBalancing while adding documents
Hi I have multi shard replicated index spread across two machines. Once a week, i delete the entire index and create it from scratch. Today i am using ConcurrentUpdateSolrServer in solrj to add documents to the index. I want to add documents through both the servers.. to utilise the resources... i read in wiki (i think) that LBHttpSolrServer should not be used for indexing documents. Is there any other way to send request to both the servers without using any external load balancers? I am using Solr 4.1. ./zahoor
Re: Indexed And Stored
Actually problem is i updated data first. and then i have to add new fields so i made another json file [ { id:2131, newfield:{add:2121} }, { id:21, newfield:{add:21} } ] now i have two different files. so if i try to update previous file for indexed = true. it erase new field -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893p4039930.html Sent from the Solr - User mailing list archive at Nabble.com.
solrcloud-zookeeper
Hi all, the first question: is there a way to reduce timeout when sold shard comes up? it looks in log file as follows: Feb 12, 2013 1:19:08 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178992 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178489 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=177986 And another one - let's assume I have 2 shards and one of them is down (both - master and slave) for some reason. What is happening now is that cluster returns 503 on the search request. Is there a way to configure to get responses from other shard? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-zookeeper-tp4039934.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: memory leak - multiple cores
Marcos, You could consider using the CoreAdminHandler instead: http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler It works extremely well. Otherwise, you should periodically restart Tomcat. I'm not sure how much memory would be leaked, but it's likely not going to have much of an impact for a few iterations. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote: Hi Michael, Yes, we do intend to reload Solr when deploying new cores. So we deploy it, update solr.xml and then restart Solr only. So this will happen sometimes in production, but mostly testing. Which means it will be a real pain. Any way to fix this? Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m. Regards, Marcos On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote: Marcos, The later 3 errors are common and won't pose a problem unless you intend to reload the Solr application without restarting Geronimo often. The first error, however, shouldn't happen. Have you changed the size of PermGen at all? I noticed this error while testing Solr 4.0 in Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0, you might want to try upgrading. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote: Hi, I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the following issue and it eats up a lot of memory when shutting down. Has anyone seen this and have an idea how to solve it? Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError: PermGen space 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! instance=2080324477 Regards, Marcos
Re: memory leak - multiple cores
I should also say that there can easily be memory leaked from permgen space when reloading webapps in Tomcat regardless of what resources the app creates because class references from the context classloader to the parent classloader can't be collected appropriately, so restarting Tomcat periodically when you reload webapps is a good practice either way. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Feb 12, 2013 at 9:03 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Marcos, You could consider using the CoreAdminHandler instead: http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler It works extremely well. Otherwise, you should periodically restart Tomcat. I'm not sure how much memory would be leaked, but it's likely not going to have much of an impact for a few iterations. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote: Hi Michael, Yes, we do intend to reload Solr when deploying new cores. So we deploy it, update solr.xml and then restart Solr only. So this will happen sometimes in production, but mostly testing. Which means it will be a real pain. Any way to fix this? Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m. Regards, Marcos On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote: Marcos, The later 3 errors are common and won't pose a problem unless you intend to reload the Solr application without restarting Geronimo often. The first error, however, shouldn't happen. Have you changed the size of PermGen at all? I noticed this error while testing Solr 4.0 in Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0, you might want to try upgrading. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote: Hi, I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the following issue and it eats up a lot of memory when shutting down. Has anyone seen this and have an idea how to solve it? Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError: PermGen space 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! instance=2080324477 Regards, Marcos
Re: memory leak - multiple cores
Many thanks! I will try to use the CoreAdminHandler and see if that solves the issue! On Feb 12, 2013, at 9:05 AM, Michael Della Bitta wrote: I should also say that there can easily be memory leaked from permgen space when reloading webapps in Tomcat regardless of what resources the app creates because class references from the context classloader to the parent classloader can't be collected appropriately, so restarting Tomcat periodically when you reload webapps is a good practice either way. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Feb 12, 2013 at 9:03 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Marcos, You could consider using the CoreAdminHandler instead: http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler It works extremely well. Otherwise, you should periodically restart Tomcat. I'm not sure how much memory would be leaked, but it's likely not going to have much of an impact for a few iterations. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote: Hi Michael, Yes, we do intend to reload Solr when deploying new cores. So we deploy it, update solr.xml and then restart Solr only. So this will happen sometimes in production, but mostly testing. Which means it will be a real pain. Any way to fix this? Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m. Regards, Marcos On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote: Marcos, The later 3 errors are common and won't pose a problem unless you intend to reload the Solr application without restarting Geronimo often. The first error, however, shouldn't happen. Have you changed the size of PermGen at all? I noticed this error while testing Solr 4.0 in Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0, you might want to try upgrading. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote: Hi, I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the following issue and it eats up a lot of memory when shutting down. Has anyone seen this and have an idea how to solve it? Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError: PermGen space 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! instance=2080324477 Regards, Marcos
Re: solrcloud-zookeeper
By default, on cluster startup, we wait until we see all the replicas for a shard come up. This is for safety. You may have introduced an old shard with old data or a new shard with no data, and you don't want something like that becoming the leader. If you don't want to do this wait, it's configurable. In solr.xml change the cores attribute leaderVoteWait to n milliseconds or 0. It defaults to 18 (3 minutes). - Mark On Feb 12, 2013, at 8:31 AM, adm1n evgeni.evg...@gmail.com wrote: Hi all, the first question: is there a way to reduce timeout when sold shard comes up? it looks in log file as follows: Feb 12, 2013 1:19:08 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178992 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178489 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=177986 And another one - let's assume I have 2 shards and one of them is down (both - master and slave) for some reason. What is happening now is that cluster returns 503 on the search request. Is there a way to configure to get responses from other shard? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-zookeeper-tp4039934.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Possible issue in edismax?
Hi Felipe, Just a short note to say thanks for your valuable suggestion. I had implemented that and could see expected results. The length norm still spoils it for few fields but I balanced it with the boost factors accordingly. Once again, Many Thanks! Sandeep On 1 February 2013 22:53, Sandeep Mestry sanmes...@gmail.com wrote: Brilliant! Thanks very much for your response. . On 1 Feb 2013 20:37, Felipe Lahti fla...@thoughtworks.com wrote: It's not necessary. It's only query time. On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi.. Could you tell me if changing default similarity to custom implementation will require me to rebuild the index? Or will it be used only query time? thanks, Sandeep On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote: So, it depends of your business requirement, right? If a document has matches in more searchable fields, at least for me, this document is more important than other document that has less matches. Example: Put this in your schema: similarity class=com.your.namespace.NoIDFSimilarity / And create a class in your classpath of your Solr: package com.your.namespace; import org.apache.lucene.search.similarities.DefaultSimilarity; public class NoIDFSimilarity extends DefaultSimilarity { @Override public float idf(long docFreq, long numDocs) { return 1; } } It will neutralize the idf (which is the rarity of term). On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe.. Can you point me an example please? Also forgive me but if a document has matches in more searchable fields then should it not rank higher? Thanks, Sandeep On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote: If you compare the first and last document scores you will see that the last one matches more fields than first one. So, you maybe thinking why? The first doc only matches contributions field and the last matches a bunch of fields so if you want to have behave more like (str name=qfseries_title^500 title^100 description^15 contribution/str) you have to override the method of DefaultSimilarity. On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com wrote: I have pasted it below and it is slightly variant from the dismax configuration I have mentioned above as I was playing with all sorts of boost values, however it looks more lie below: str name=c208c2ca-4270-27b8-e040-a8c00409063a 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others of: 2675.7844 = (MATCH) weight(contributions:news in 63298) [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 40960.0 = fieldNorm(doc=63298) /str str name=c208c2a9-66bc-27b8-e040-a8c00409063a 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others of: 2317.297 = (MATCH) weight(contributions:news in 9826415) [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 = termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0), with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415) /str str name=c208c2aa-1806-27b8-e040-a8c00409063a 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others of: 2140.6274 = (MATCH) weight(contributions:news in 9882325) [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 32768.0 = fieldNorm(doc=9882325) /str str name=c208c2b0-5165-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 220007) [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 =
Re: More Like This, finding original record
Well, i have found the following line in MoreLikeThisHandler$MoreListThisHelper.getMoreLikeThis(..) // exclude current document from results realMLTQuery.add( new TermQuery http://javasourcecode.org/html/open-source/lucene/lucene-3.6.1/org/apache/lucene/search/TermQuery.java.html(new Term http://javasourcecode.org/html/open-source/lucene/lucene-3.6.1/org/apache/lucene/index/Term.java.html(uniqueKeyField.getName(), uniqueKeyField.getType().storedToIndexed(doc.getFieldable(uniqueKeyField.getName(), BooleanClause.Occur.MUST_NOT); I'll try to remove the line someway, and see if my results work for me. It at least is clear that this line is not surrounded by any if statement and will always be executed, so 'NO' is the answer to is there a way to get the current document in the search results?. Have Fun Daniel On Tue, Feb 12, 2013 at 3:25 PM, Daniel Rijkhof daniel.rijk...@gmail.comwrote: I guess it's not possible, but perhaps someone knows how to do this: Do a more like this query (through the mlt handler), And find the match record within the response records (top match, should be first in list). This would then make it possible for me to compare scores... Anybody around that did this? (Modify source code perhaps?) Have Fun Daniel
DisMax Query Field-Filters (ASCIIFolding)
Hello, I have an interesting behaviour. I have a FieldType Text_PL. This type is configured as: fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType So, one filter in the chain is the ASCIIFoldingFilterFactory which normalizes special characters (e.g. ó -- O). If I query field:czolenka it shows the same behaviour like searching for field:czółenka - as expected. Now, if I use the DisMax query, this normalization step does not take place. I debugged the code, if I run the normal query, the debugger stops at the ASCIIFoldingFilter (as expected), if I run the DisMax Query, there is no stop at this filter - so the filter is not used. Does anybody has an idea why? Do I have to configure the DisMax RequestHandler for ASCIIFolding - if possible ? Thanks, Ralf
Re: SolrCloud and hardcoded 'id' field
Apparently this was a side effect of the custom sharding feature. There is a fix planned, but I don't know more about it than that. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 11, 2013 at 7:15 PM, Shawn Heisey s...@elyograg.org wrote: I have heard that SolrCloud may require the presence of a uniqeKey field specifically named 'id' for sharding. Is this true? Is it still true as of Solr 4.2-SNAPSHOT? If not, what svn commit fixed it? If so, should I file a jira? I am not actually using SolrCloud for one index, but my worry is that once a precedent for putting specific names in the code is set, it may bleed over into other features. Also, I have another set of servers for a different purpose that ARE using SolrCloud. Currently that system uses numShards=1, but one day we might want to do a distributed search there. Both my systems have a uniqueKey field other than 'id' and it would be quite a task to change it. The 'id' field doesn't exist at all in either system. Here's relevant info for one of the systems: field name=tag_id type=lowercase indexed=true stored=true omitTermFreqAndPositions=true/ !-- lowercases the entire field value -- fieldType name=lowercase class=solr.TextField sortMissingLast=true positionIncrementGap=0 omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType uniqueKeytag_id/uniqueKey Thanks, Shawn
Benefits of Solr over Lucene?
I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says Solr uses the Lucene search library and extends it!, but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DisMax Query Field-Filters (ASCIIFolding)
Hi Ralf, Dismax querparser does not allow fielded queries. e.g. field:something Consider using edismax query parser instead. Also debugQuery=on will display informative output how query parsed analyzed etc. ahmet --- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote: From: Ralf Heyde ralf.he...@gmx.de Subject: DisMax Query Field-Filters (ASCIIFolding) To: solr-user@lucene.apache.org Date: Tuesday, February 12, 2013, 5:25 PM Hello, I have an interesting behaviour. I have a FieldType Text_PL. This type is configured as: fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType So, one filter in the chain is the ASCIIFoldingFilterFactory which normalizes special characters (e.g. ó -- O). If I query field:czolenka it shows the same behaviour like searching for field:czółenka - as expected. Now, if I use the DisMax query, this normalization step does not take place. I debugged the code, if I run the normal query, the debugger stops at the ASCIIFoldingFilter (as expected), if I run the DisMax Query, there is no stop at this filter - so the filter is not used. Does anybody has an idea why? Do I have to configure the DisMax RequestHandler for ASCIIFolding - if possible ? Thanks, Ralf
Re: Benefits of Solr over Lucene?
http://lucene.apache.org/solr/ On Tue, Feb 12, 2013 at 10:40 AM, JohnRodey timothydd...@yahoo.com wrote: I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says Solr uses the Lucene search library and extends it!, but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html Sent from the Solr - User mailing list archive at Nabble.com. -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Solr 3.3.0 - Random CPU problem
Hi all, I'm using Solr 3.3.0 with one master server and two slaves. And the problem I'm having is that both slaves get degraded randomly but at the same time. I am completely lost at to what the cause could be, but I see that the tomcat that runs Solr webapp executes a PERL script that consumes 100% of the CPU and when I go and kill it manually solr starts working perfectly again. Does anybody has any idea of what the problem could be? This is killing performance on my production environment and I've got no idea of what's going on :S Any help will be greatly appreciated. Regards, Federico -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-0-Random-CPU-problem-tp4039969.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DisMax Query Field-Filters (ASCIIFolding)
Hi, thanks for your first Answer. I don't want to have a fielded-query in my DisMax Query. My DismaxQuery looks like this: qt=dismaxq=czółenka... -- works qt=dismaxq=czolenka... -- does not work The accessed Fields contain the ASCIIFoldingFilter for Query Index. So, what I need is, that the DisMax QueryParser normalizes by ASCIIFolding. Is that possible? Thanks, Ralf Original-Nachricht Datum: Tue, 12 Feb 2013 07:42:17 -0800 (PST) Von: Ahmet Arslan iori...@yahoo.com An: solr-user@lucene.apache.org Betreff: Re: DisMax Query Field-Filters (ASCIIFolding) Hi Ralf, Dismax querparser does not allow fielded queries. e.g. field:something Consider using edismax query parser instead. Also debugQuery=on will display informative output how query parsed analyzed etc. ahmet --- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote: From: Ralf Heyde ralf.he...@gmx.de Subject: DisMax Query Field-Filters (ASCIIFolding) To: solr-user@lucene.apache.org Date: Tuesday, February 12, 2013, 5:25 PM Hello, I have an interesting behaviour. I have a FieldType Text_PL. This type is configured as: fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType So, one filter in the chain is the ASCIIFoldingFilterFactory which normalizes special characters (e.g. ó -- O). If I query field:czolenka it shows the same behaviour like searching for field:czółenka - as expected. Now, if I use the DisMax query, this normalization step does not take place. I debugged the code, if I run the normal query, the debugger stops at the ASCIIFoldingFilter (as expected), if I run the DisMax Query, there is no stop at this filter - so the filter is not used. Does anybody has an idea why? Do I have to configure the DisMax RequestHandler for ASCIIFolding - if possible ? Thanks, Ralf
Re: DisMax Query Field-Filters (ASCIIFolding)
1. Show us the full query request and request handler. In particular, the qf parameter. 2. Try the Solr Admin Analysis UI to check for sure how the analysis is being performed. 3. Add debugQuery=true to your query to see how it is actually parsed. 4. If there is any chance that you have modified your field type since originally indexing the data, be sure to completely reindex after ANY change in the field types. -- Jack Krupansky -Original Message- From: Ralf Heyde Sent: Tuesday, February 12, 2013 10:53 AM To: solr-user@lucene.apache.org Subject: Re: DisMax Query Field-Filters (ASCIIFolding) Hi, thanks for your first Answer. I don't want to have a fielded-query in my DisMax Query. My DismaxQuery looks like this: qt=dismaxq=czółenka... -- works qt=dismaxq=czolenka... -- does not work The accessed Fields contain the ASCIIFoldingFilter for Query Index. So, what I need is, that the DisMax QueryParser normalizes by ASCIIFolding. Is that possible? Thanks, Ralf Original-Nachricht Datum: Tue, 12 Feb 2013 07:42:17 -0800 (PST) Von: Ahmet Arslan iori...@yahoo.com An: solr-user@lucene.apache.org Betreff: Re: DisMax Query Field-Filters (ASCIIFolding) Hi Ralf, Dismax querparser does not allow fielded queries. e.g. field:something Consider using edismax query parser instead. Also debugQuery=on will display informative output how query parsed analyzed etc. ahmet --- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote: From: Ralf Heyde ralf.he...@gmx.de Subject: DisMax Query Field-Filters (ASCIIFolding) To: solr-user@lucene.apache.org Date: Tuesday, February 12, 2013, 5:25 PM Hello, I have an interesting behaviour. I have a FieldType Text_PL. This type is configured as: fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType So, one filter in the chain is the ASCIIFoldingFilterFactory which normalizes special characters (e.g. ó -- O). If I query field:czolenka it shows the same behaviour like searching for field:czółenka - as expected. Now, if I use the DisMax query, this normalization step does not take place. I debugged the code, if I run the normal query, the debugger stops at the ASCIIFoldingFilter (as expected), if I run the DisMax Query, there is no stop at this filter - so the filter is not used. Does anybody has an idea why? Do I have to configure the DisMax RequestHandler for ASCIIFolding - if possible ? Thanks, Ralf
Re: DisMax Query Field-Filters (ASCIIFolding)
I'll try to reindex - i modified the schema, but NOT re-indexed the Index. Damn ! Original-Nachricht Datum: Tue, 12 Feb 2013 11:14:04 -0500 Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Betreff: Re: DisMax Query Field-Filters (ASCIIFolding) 1. Show us the full query request and request handler. In particular, the qf parameter. 2. Try the Solr Admin Analysis UI to check for sure how the analysis is being performed. 3. Add debugQuery=true to your query to see how it is actually parsed. 4. If there is any chance that you have modified your field type since originally indexing the data, be sure to completely reindex after ANY change in the field types. -- Jack Krupansky -Original Message- From: Ralf Heyde Sent: Tuesday, February 12, 2013 10:53 AM To: solr-user@lucene.apache.org Subject: Re: DisMax Query Field-Filters (ASCIIFolding) Hi, thanks for your first Answer. I don't want to have a fielded-query in my DisMax Query. My DismaxQuery looks like this: qt=dismaxq=czółenka... -- works qt=dismaxq=czolenka... -- does not work The accessed Fields contain the ASCIIFoldingFilter for Query Index. So, what I need is, that the DisMax QueryParser normalizes by ASCIIFolding. Is that possible? Thanks, Ralf Original-Nachricht Datum: Tue, 12 Feb 2013 07:42:17 -0800 (PST) Von: Ahmet Arslan iori...@yahoo.com An: solr-user@lucene.apache.org Betreff: Re: DisMax Query Field-Filters (ASCIIFolding) Hi Ralf, Dismax querparser does not allow fielded queries. e.g. field:something Consider using edismax query parser instead. Also debugQuery=on will display informative output how query parsed analyzed etc. ahmet --- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote: From: Ralf Heyde ralf.he...@gmx.de Subject: DisMax Query Field-Filters (ASCIIFolding) To: solr-user@lucene.apache.org Date: Tuesday, February 12, 2013, 5:25 PM Hello, I have an interesting behaviour. I have a FieldType Text_PL. This type is configured as: fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=words/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory protected=words/protwords_pl.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType So, one filter in the chain is the ASCIIFoldingFilterFactory which normalizes special characters (e.g. ó -- O). If I query field:czolenka it shows the same behaviour like searching for field:czółenka - as expected. Now, if I use the DisMax query, this normalization step does not take place. I debugged the code, if I run the normal query, the debugger stops at the ASCIIFoldingFilter (as expected), if I run the DisMax Query, there is no stop at this filter - so the filter is not used. Does anybody has an idea why? Do I have to configure the DisMax RequestHandler for ASCIIFolding - if possible ? Thanks, Ralf
Re: Benefits of Solr over Lucene?
This is apples and pomegranates. Lucene is a library, Solr is a server. In features, they are more alike than different. wunder On Feb 12, 2013, at 7:40 AM, JohnRodey wrote: I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says Solr uses the Lucene search library and extends it!, but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks!
Re: Benefits of Solr over Lucene?
Here's yet another short list of benefits of Solr over Lucene (not that any of them take away from Lucene since Solr is based on Lucene): - Multiple core index - go beyond the limits of a single lucene index - Support for multi-core or named collections - richer query parsers (e.g., schema-aware, edismax) - schema language, including configurable field types and configurable analyzers - easier to do per-field/type analysis - plugin architecture, easily configured and customized - Generally, develop a search engine without writing any code, and what code you may write is mostly easily configured plugins - Editable configuration file rather than hard-coded or app-specific properties - Tomcat/Jetty container support enable system administration as corporate IT ops teams already know it - Web-based Admin UI, including debugging features such as field/type analysis - Solr search features are available to any app written in any language, not just Java. All you need is HTTP access. (Granted, there is SOME support for Lucene in SOME other languages.) In short, if you want to embed search engine capabilities in your Java app, Lucene is the way to go, but if you want a web architecture, with the search engine in a separate process from the app in a multi-tier architecture, Solr is the way to go. Granted, you could also use ElasticSearch or roll your own, but Solr basically runs right out of the box with no code development needed to get started and no Java knowledge needed. And to be clear, Solr is not simply an extension of Lucene - Solr is a distinct architectural component that is based on Lucene. In OOP terms, think of composition rather than derivation. -- Jack Krupansky -Original Message- From: JohnRodey Sent: Tuesday, February 12, 2013 10:40 AM To: solr-user@lucene.apache.org Subject: Benefits of Solr over Lucene? I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says Solr uses the Lucene search library and extends it!, but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Benefits of Solr over Lucene?
Add to Jack reply, Solr can also be embed into the application and can run on same process. Solr, the server-I zation of lucene. The line is very blurred and solr is not a very thin wrapper around lucene library. Most solr features are distinct from lucene like - detailed breakdown of scoring mathematics - text analysis phases - solr adds to lucene's text analysis library and makes it configurable through XML - introduce the notion of a field types - runtime performance stats including cache hit/ miss rate Rgds AJ On 12-Feb-2013, at 22:17, Jack Krupansky j...@basetechnology.com wrote: Here's yet another short list of benefits of Solr over Lucene (not that any of them take away from Lucene since Solr is based on Lucene): - Multiple core index - go beyond the limits of a single lucene index - Support for multi-core or named collections - richer query parsers (e.g., schema-aware, edismax) - schema language, including configurable field types and configurable analyzers - easier to do per-field/type analysis - plugin architecture, easily configured and customized - Generally, develop a search engine without writing any code, and what code you may write is mostly easily configured plugins - Editable configuration file rather than hard-coded or app-specific properties - Tomcat/Jetty container support enable system administration as corporate IT ops teams already know it - Web-based Admin UI, including debugging features such as field/type analysis - Solr search features are available to any app written in any language, not just Java. All you need is HTTP access. (Granted, there is SOME support for Lucene in SOME other languages.) In short, if you want to embed search engine capabilities in your Java app, Lucene is the way to go, but if you want a web architecture, with the search engine in a separate process from the app in a multi-tier architecture, Solr is the way to go. Granted, you could also use ElasticSearch or roll your own, but Solr basically runs right out of the box with no code development needed to get started and no Java knowledge needed. And to be clear, Solr is not simply an extension of Lucene - Solr is a distinct architectural component that is based on Lucene. In OOP terms, think of composition rather than derivation. -- Jack Krupansky -Original Message- From: JohnRodey Sent: Tuesday, February 12, 2013 10:40 AM To: solr-user@lucene.apache.org Subject: Benefits of Solr over Lucene? I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says Solr uses the Lucene search library and extends it!, but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any inputs regarding running solr cluster on virtual machines?
On 2/12/2013 12:25 AM, adfel70 wrote: I'm currently running a solr cluster on 10 physical machines. I'm considering moving to virtual machines. Any insights on this issue? Have anyone tried this? any best practices? You'll definitely see some performance degradation. How much is very hard to say. I started with Solr on virtual machines, first vmware esxi (free version) and then Xen, one core/shard per VM. I later moved to the bare metal (same machines) and began running multiple cores/shards per Solr instance, one instance per machine. Performance is better and I don't have to maintain as many copies of the OS. It did work perfectly when virtualized, though. Thanks, Shawn
Re: URLDecoder error message
On 2/12/2013 1:42 AM, o.mares wrote: yesterday we updated from solr 4.0 to solr 4.1 and since then from time to time following error pops up: {msg=URLDecoder: Invalid character encoding detected after position 160 of query string / form data (while parsing as UTF-8),code=400}: {msg=URLDecoder: Invalid character encoding detected after position 160 of query string / form data (while parsing as UTF-8),code=400} Is this an issue with incorrect input data, or is something broken with our solr? Solr 4.0 did not gave us these errors. Is your client code using UTF-8? It sounds like maybe it's not. Thanks, Shawn
Re: Benefits of Solr over Lucene?
So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like aw snap, it doesn't support geospatial (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Any thoughts? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964p4040009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.3.0 - Random CPU problem
: I'm using Solr 3.3.0 with one master server and two slaves. And the problem : I'm having is that both slaves get degraded randomly but at the same time. : I am completely lost at to what the cause could be, but I see that the : tomcat that runs Solr webapp executes a PERL script that consumes 100% of : the CPU and when I go and kill it manually solr starts working perfectly : again. : : Does anybody has any idea of what the problem could be? what is the name of the perl script? what do the contents of that perl script look like? how do you kow it is being run by tomcat? Solr doesn't ship with any perl scripts, and to the best of my knowledge neither does tomcat ... so it sounds like your problem either isn't specificaly about Solr, but perhaps about something related to your Solr/Tomcat configuration? -Hoss
Re: Solr 3.3.0 - Random CPU problem
I don't know how the perl script looks like. I can tell it's being ran by tomcat because when I do : top the owner of the process says tomcat and the CPU is at 100%. I haven't done anything weird to my Solr installation, actually is pretty simple and is the one it used to be on the solr website a year ago or something like that. Do you have any idea of how to see which PERL script is being executed or what it's content is? Thanks for your reply! Regards, Federico -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-0-Random-CPU-problem-tp4039969p4040019.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: which analyzer is used for facet.query?
: So it seems that facet.query is using the analyzer of type index. : Is it a bug or is there another analyzer type for the facet query? That doesn't really make any sense ... i don't know much about setting up UIMA (or what/when it logs things) but facet.query uses the regular query parser framework, which uses the query analyzer for fields when building up queries. You can see clear evidence of this by looking at the following query using the 4.1 example configs... http://localhost:8983/solr/select?q=name:piximafl=namedebugQuery=true The text_general field type used by the name field is configured to use synonyms at query time, but not index time, and you can see that it maps pixima to pixma ... str name=querystringname:pixima/str str name=parsedqueryname:pixma/str If you have the sample documents indexed, you can see a single match for the query above, and if you use pixima in a facet.query, you can see the expected count... http://localhost:8983/solr/select?q=*:*facet.query=name:piximafacet=truerows=0 result name=response numFound=32 start=0/ lst name=facet_counts lst name=facet_queries int name=name:pixima1/int /lst ... -Hoss
Re: Solr 3.3.0 - Random CPU problem
: I don't know how the perl script looks like. I can tell it's being ran by : tomcat because when I do : top the owner of the process says tomcat and : the CPU is at 100%. ... : Do you have any idea of how to see which PERL script is being executed or : what it's content is? look at the PID column in top, assuming it's something like 1234 then in another shell run this command... ps -wwwf -p 1234 ...and that should tell you the detals of the process including the path to the perl script. (the args you need for ps may be different if you aren't running linux) -Hoss
Re: Need to create SolrServer objects without checking server availability
: The problem is at program startup -- when 'new HttpSolrServer(url)' is called, : it goes and makes sure that the server is up and responsive. If any of those : 56 object creation calls fails, then my app won't even start. What exactly is the exception are you getting? i don't think antying in HttpSolrServer explicitly tries to test the server on creation. : Someone on the IRC channel brought up the possibility of initializing the : HttpClient myself instead of letting the Solr object do it. If the health : check is actually in HttpClient, this might work, if there's a way to : initialize the HttpClient without a health check. I've actually been : wondering if it makes any sense to re-use a single HttpClient object across : all 56 server objects. It probably would make sense for you to create a single HttpClient object that you re-use in all of your HttpSolrServer instances -- but i'm not sure how that would help your problem, since the HttpClient constructed implicitly by HttpSolrServer(String) doesn't know anything about the baseUrl. HttpClient literally can not test the health of the URL when it is created, because it doesn't know anything about any URLs until a request is executed. -Hoss
Re: Benefits of Solr over Lucene?
On 2/12/2013 11:19 AM, JohnRodey wrote: So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like aw snap, it doesn't support geospatial (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Because Solr is written using the Lucene API, if you want to use Lucene, you can do anything Solr can, plus plenty of things that Solr can't -- but for many of those, you'd have to write the code yourself. That's the key difference -- with Solr, a HUGE amount of coding is already done for you, you just have to put a few easy-to-debug client API calls in your code. From my perspective as a user with some Java coding ability but not any true experience with large-scale development: If your development team is ready and capable of writing Lucene code, then it would be better to use Solr instead, and if there's something you need that Solr can't do, put your development team to work writing the required plugin. They would likely spend far less time doing that than writing an entire search system using Lucene. Thanks, Shawn
Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?
Michael is correct, that was what was said at the bootcamp (by me). I believe this may not be correct though. Further code review shows that Solr 4.0 was already distributing documents using the hash range technique used in 4.1. The big change in 4.1 was that a composite hash key could be used to distribute docs around the hash range. But docs that don't use the composite key would be distributed similarly to 4.0. So, you may not need to re-index to take advantage of shard splitting. This will become more clear as shard splitting documentation becomes available. On Mon, Feb 11, 2013 at 12:45 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Arkadi, That's the answer I received at Solr Bootcamp, yes. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 11, 2013 at 2:23 AM, Arkadi Colson ark...@smartbit.be wrote: Does it mean that when you redo indexing after the upgrade to 4.1 shard splitting will work in 4.2? Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81 On 02/10/2013 05:21 PM, Michael Della Bitta wrote: No. You can just update Solr in place. But... If you're using Solr Cloud, your documents won't be hashed in a way that lets you do shard splitting in 4.2. That seemed to be the consensus during Solr Boot Camp. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Sun, Feb 10, 2013 at 10:46 AM, adfel70 adfe...@gmail.com wrote: Do I have to recreate the collections/cores? Do I have to reindex? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Benefits of Solr over Lucene?
Is there a page on the wiki that points out the use cases (or the features) that are best suited for Lucene adoption, and those best suited for SOLR adoption? -Glen On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote: On 2/12/2013 11:19 AM, JohnRodey wrote: So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like aw snap, it doesn't support geospatial (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Because Solr is written using the Lucene API, if you want to use Lucene, you can do anything Solr can, plus plenty of things that Solr can't -- but for many of those, you'd have to write the code yourself. That's the key difference -- with Solr, a HUGE amount of coding is already done for you, you just have to put a few easy-to-debug client API calls in your code. From my perspective as a user with some Java coding ability but not any true experience with large-scale development: If your development team is ready and capable of writing Lucene code, then it would be better to use Solr instead, and if there's something you need that Solr can't do, put your development team to work writing the required plugin. They would likely spend far less time doing that than writing an entire search system using Lucene. Thanks, Shawn -- - http://zzzoot.blogspot.com/ -
Re: Benefits of Solr over Lucene?
It is like deciding between a disk drive and a file server. Solr and Lucene are different kinds of things. wunder On Feb 12, 2013, at 12:26 PM, Glen Newton wrote: Is there a page on the wiki that points out the use cases (or the features) that are best suited for Lucene adoption, and those best suited for SOLR adoption? -Glen On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote: On 2/12/2013 11:19 AM, JohnRodey wrote: So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like aw snap, it doesn't support geospatial (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Because Solr is written using the Lucene API, if you want to use Lucene, you can do anything Solr can, plus plenty of things that Solr can't -- but for many of those, you'd have to write the code yourself. That's the key difference -- with Solr, a HUGE amount of coding is already done for you, you just have to put a few easy-to-debug client API calls in your code. From my perspective as a user with some Java coding ability but not any true experience with large-scale development: If your development team is ready and capable of writing Lucene code, then it would be better to use Solr instead, and if there's something you need that Solr can't do, put your development team to work writing the required plugin. They would likely spend far less time doing that than writing an entire search system using Lucene. Thanks, Shawn -- - http://zzzoot.blogspot.com/ - -- Walter Underwood wun...@wunderwood.org
Re: Benefits of Solr over Lucene?
And helping people - who don't know much about them - how to decide which to use is not useful? -Glen On Tue, Feb 12, 2013 at 3:34 PM, Walter Underwood wun...@wunderwood.org wrote: It is like deciding between a disk drive and a file server. Solr and Lucene are different kinds of things. wunder On Feb 12, 2013, at 12:26 PM, Glen Newton wrote: Is there a page on the wiki that points out the use cases (or the features) that are best suited for Lucene adoption, and those best suited for SOLR adoption? -Glen On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote: On 2/12/2013 11:19 AM, JohnRodey wrote: So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like aw snap, it doesn't support geospatial (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Because Solr is written using the Lucene API, if you want to use Lucene, you can do anything Solr can, plus plenty of things that Solr can't -- but for many of those, you'd have to write the code yourself. That's the key difference -- with Solr, a HUGE amount of coding is already done for you, you just have to put a few easy-to-debug client API calls in your code. From my perspective as a user with some Java coding ability but not any true experience with large-scale development: If your development team is ready and capable of writing Lucene code, then it would be better to use Solr instead, and if there's something you need that Solr can't do, put your development team to work writing the required plugin. They would likely spend far less time doing that than writing an entire search system using Lucene. Thanks, Shawn -- - http://zzzoot.blogspot.com/ - -- Walter Underwood wun...@wunderwood.org -- - http://zzzoot.blogspot.com/ -
Re: Need to create SolrServer objects without checking server availability
On 2/12/2013 12:27 PM, Chris Hostetter wrote: : The problem is at program startup -- when 'new HttpSolrServer(url)' is called, : it goes and makes sure that the server is up and responsive. If any of those : 56 object creation calls fails, then my app won't even start. What exactly is the exception are you getting? i don't think anything in HttpSolrServer explicitly tries to test the server on creation. : Someone on the IRC channel brought up the possibility of initializing the : HttpClient myself instead of letting the Solr object do it. If the health : check is actually in HttpClient, this might work, if there's a way to : initialize the HttpClient without a health check. I've actually been : wondering if it makes any sense to re-use a single HttpClient object across : all 56 server objects. It probably would make sense for you to create a single HttpClient object that you re-use in all of your HttpSolrServer instances -- but i'm not sure how that would help your problem, since the HttpClient constructed implicitly by HttpSolrServer(String) doesn't know anything about the baseUrl. HttpClient literally can not test the health of the URL when it is created, because it doesn't know anything about any URLs until a request is executed. I gathered up the exception, looked it over very closely ... and it's all my fault! User error all the way on this one. In my code, as soon as I create the object, I make a call that gets the dataDir and instanceDir. I initially wrote this code so long ago that I had forgotten this fact. I have now changed it so this query is only made when the information is requested, not when the object initializes. Aside: it turns out that nothing in my code actually USES those getter methods! Now my program will start up even if Solr is down. I'll look into re-using an http client on all objects. Thanks for the prodding, Hoss! Shawn
Re: Benefits of Solr over Lucene?
Do you want to embed an index into your application, e.g. as a desktop app? Use Lucene. Is search basically the whole of your app? Perhaps use Lucene. Do you want you offer search as a service? Do you want to be able to arbitrarily scale your index (beyond the number of documents a single index can handle, or beyond the load a single server can handle), do you want to offer search services to a number of other servers? Then use Solr. Once you get that Lucene is a library you embed into your java app, and Solr is a server that you connect to from other server(s), you will hopefully be able to work out which is more appropriate. If you consider using Lucene in the latter scenario, you will probably end up rewriting a lot of what Solr does anyway. Upayavira On Tue, Feb 12, 2013, at 08:26 PM, Glen Newton wrote: Is there a page on the wiki that points out the use cases (or the features) that are best suited for Lucene adoption, and those best suited for SOLR adoption? -Glen On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote: On 2/12/2013 11:19 AM, JohnRodey wrote: So I have had a fair amount of experience using Solr. However on a separate project we are considering just using Lucene directly, which I have never done. I am trying to avoid finding out late that Lucene doesn't offer what we need and being like aw snap, it doesn't support geospatial (or highlighting, or dynamic fields, or etc...). I am more curious about core index and search features, and not as much with sharding, cloud features, different client languages and so on. Because Solr is written using the Lucene API, if you want to use Lucene, you can do anything Solr can, plus plenty of things that Solr can't -- but for many of those, you'd have to write the code yourself. That's the key difference -- with Solr, a HUGE amount of coding is already done for you, you just have to put a few easy-to-debug client API calls in your code. From my perspective as a user with some Java coding ability but not any true experience with large-scale development: If your development team is ready and capable of writing Lucene code, then it would be better to use Solr instead, and if there's something you need that Solr can't do, put your development team to work writing the required plugin. They would likely spend far less time doing that than writing an entire search system using Lucene. Thanks, Shawn -- - http://zzzoot.blogspot.com/ -
Re: Edismax and mm per field
: Currently, edismax applies mm to the combination of all fields listed in qf. : : I would like to have mm applied individually to those fields instead. That doesn't really make sense if you think about how the qf is used to build the final query structure -- it is essentially producing a cross product of the fields in the qf and the chunks of input in the query string. the mm param says how many clauses of the final resulting query (which are each queries for the same chunk across multiple fields) must match. This blog i wrote a while back tries to explain this... http://searchhub.org/2010/05/23/whats-a-dismax/ : For instance, the query: : 1) : defType=edismaxq=leo fostermm=2qf=title^5 : summary^2pf=title^5fq=contentsource:src1 : : would return a doc where : title: leo lee : summary:Joe foster which is exactly what it's designed to do -- that way queries like David Smiley Solr Enterprise Search Server will match a document with David Smiley in the author field and Apache Solr 3 Enterprise Search Server in the title field. : For the original query 1), having an additional parameter like: : : - mm.qf=true (tell solr to do mm on individual fields in qf ) : or : - mm.pf=true (tell solr to do mm on individual fields in pf) : : or anything along the line would be useful. ...but you have to think about what you would want solr to do with params like that -- look at the query structure solr produces wih multiple fields in the qf, and multiple terms in the query string. look at where the minNumberShouldMatch is set on the outer BooleanQuery and think about where/how you would like to see a per-field mm applied -- if you can explain that in psuedo-code, then it's certainly worth discussing, but i'm not understanidng how it would make sense. -Hoss
DIH Delete with Full Import
Hi, I'm using this configuration: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport The wiki says: In this case it means obviously that in case you also want to use deletedPkQuery then when running the delta-import command is still necessary. In this link: http://wiki.apache.org/solr/DataImportHandler - *postImportDeleteQuery* : after full-import this will be used to cleanup the index !. This is honored only on an entity that is an immediate sub-child of document Solr1.4 http://wiki.apache.org/solr/Solr1.4. Is it possible for me to use full-import and postimportdeletequery ? I have table that has the UUIDs of all the records that need to be deleted. Can I define something like postImportDeleteQuery = Select Id from delete_log_table. Can someone provide me an example ? Any help is much appreciated. Thank you.
Re: Eastings and northings support in Solr Spatial
Yeah, solr.PointType. Or use solr.SpatialRecursivePrefixTree with geo=false http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 On 2/8/13 10:38 AM, Kissue Kissue kissue...@gmail.com wrote: I can see Solr has the field type solr.LatLonType which supports spatial based on longitudes and latitudes. Does it support spatial based on Eastings and Northings and is the solr.PointType field type meant to be used for this type of cordinates? Thanks.
RE: what do you use for testing relevance?
Roman, Logging clicks and their position in the result list is one useful method to measure the relevance. Using the position you can calculate the mean reciprocal rank, a value near 1.0 is very good so over time you can clearly see whether changes actually improve user experience/expectations. Keep in mind that there is some noise because users tend to click one or more of the first few results anyway. You may also be interested in A/B testing. http://en.wikipedia.org/wiki/Mean_reciprocal_rank http://en.wikipedia.org/wiki/A/B_testing Cheers Markus -Original message- From:Roman Chyla roman.ch...@gmail.com Sent: Tue 12-Feb-2013 23:04 To: solr-user@lucene.apache.org Subject: what do you use for testing relevance? Hi, I do realize this is a very broad question, but still I need to ask it. Suppose you make a change into the scoring formula. How do you test/know/see what impact it had? Any framework out there? It seems like people are writing their own tools to measure relevancy. Thanks for any pointers, roman
Re: what do you use for testing relevance?
What do you want to achieve with these tests? Is it meant as a regression, to make sure that only the queries/boosts you changed are affected? Then you will have to implement tests that cover your specific schema/boosts. I'm not aware of any frameworks that do this - we're using Java based tests that retrieve documents from solr, map them to our domain model (objects representing a document) and do assertions on debug values (e.g. score) Or is it more about whats more relevant for the user? Then you will need some kind of user tracking, as Markus described already. BR On 12 February 2013 23:16, Markus Jelsma markus.jel...@openindex.io wrote: Roman, Logging clicks and their position in the result list is one useful method to measure the relevance. Using the position you can calculate the mean reciprocal rank, a value near 1.0 is very good so over time you can clearly see whether changes actually improve user experience/expectations. Keep in mind that there is some noise because users tend to click one or more of the first few results anyway. You may also be interested in A/B testing. http://en.wikipedia.org/wiki/Mean_reciprocal_rank http://en.wikipedia.org/wiki/A/B_testing Cheers Markus -Original message- From:Roman Chyla roman.ch...@gmail.com Sent: Tue 12-Feb-2013 23:04 To: solr-user@lucene.apache.org Subject: what do you use for testing relevance? Hi, I do realize this is a very broad question, but still I need to ask it. Suppose you make a change into the scoring formula. How do you test/know/see what impact it had? Any framework out there? It seems like people are writing their own tools to measure relevancy. Thanks for any pointers, roman
Re: solr4.0 problem zkHost with multiple hosts throws out of range exception
The suggested syntax didn't work with embedded ZooKeeper: Syntax: -DzkRun -DzkHost=nodeA:9983,nodeB:9983,nodeC:9983,nodeD:9983/solrroot -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=MyConfig Error: SEVERE: Could not start Solr. Check solr/home property and the logs Feb 12, 2013 1:36:49 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.NumberFormatException: For input string: 9983/solrroot at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) More details at: https://issues.apache.org/jira/browse/SOLR-4450 -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440p4040087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr4.0 problem zkHost with multiple hosts throws out of range exception
This config isn't intended for embedded zookeeper, it is for a separate zookeeper ensemble that is shared with other services. Upayavira On Tue, Feb 12, 2013, at 10:19 PM, mbennett wrote: The suggested syntax didn't work with embedded ZooKeeper: Syntax: -DzkRun -DzkHost=nodeA:9983,nodeB:9983,nodeC:9983,nodeD:9983/solrroot -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=MyConfig Error: SEVERE: Could not start Solr. Check solr/home property and the logs Feb 12, 2013 1:36:49 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.NumberFormatException: For input string: 9983/solrroot at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) More details at: https://issues.apache.org/jira/browse/SOLR-4450 -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440p4040087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to limit queries to specific IDs
First, it may not be a problem assuming your other filter queries are more frequent. Second, the easiest way to keep these out of the filter cache would be just to include them as a MUST clause, like +(original query) +id:(1 2 3 4). Third possibility, see https://issues.apache.org/jira/browse/SOLR-2429, but the short form is: fq={!cache=false}restoffq On Mon, Feb 11, 2013 at 2:41 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi everyone. I have queries that should be bounded to a set of IDs (the uniqueKey field of my schema). My client front-end sends two Solr request: In the first one, it wants to get the top X IDs. This result should return very fast. No time to waste on highlighting. this is a very standard query. In the aecond one, it wants to get the highlighting info (corresponding to the queried fields and terms, of course), on those documents (may be some sequential requests, on small bulks of the full list). These two requests are implemented as almost identical calls, to different requestHandlers. I thought to append a filter query to the second request, id:(1 2 3 4 5). Is this idea good for Solr? If does, my problem is that I don't want these filters to flood my filterCache... Is there any way (even if it involves some coding...) to add a filter query which won't be added to filterCache (at least, not instead of standard filters)? Notes: 1. It can't be assured that the the first query will remain in queryResultsCache... 2. consider index size of 50M documents...
Re: Reverse range query
Well, what does adding debug=query show you for the parsed query? What documents show up? My first guess is that since you're using exclusive rather than inclusive end points you're expectations aren't what you think. Best Erick On Mon, Feb 11, 2013 at 10:57 PM, ballusethuraman ballusethura...@gmail.com wrote: Hi, I have craeted new attribute(Year) in attribute dictionary and associated with different catentries with different values say 2000,2001,2002,2003,...2012. Now I want to search with the Year attribute with min and max range. when 2000 to 2005 is given as search condition it should fetch the catentries which is between these two values. This is the url I used to hit the solr server. ads_f11001 is the logical name of the attribute year that i have created in management center. This value will be in srchattrprop table. 2000 and 2005 is min and max range. http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000 2005} when i try to hit this url i am getting 0 records found. http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000 TO *} and http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{*TO 2005} These above two urls ferching me some result but it s not the expected result. Plz help me to solve this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Reverse-range-query-tp1789135p4039860.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LoadBalancing while adding documents
Hold on here. LBHttpSolrServer should not be used for indexing in a Master/Slave setup, but in SolrCloud you may use it. Indeed, CloudSolrServer uses LBHttpSolrServer under the covers. Now, why would you want to send requests to both servers? If you're in master/slave mode (i.e. not running Zookeeper), you _must_ send the update to the right master. If you're in SolrCloud mode, you don't care. You have to send each document to Solr only once. In Master/Slave mode, you must send it to the correct master. In SolrCloud mode you don't care where you send it, it'll be routed to the right place. Best Erick On Tue, Feb 12, 2013 at 8:22 AM, J Mohamed Zahoor zah...@indix.com wrote: Hi I have multi shard replicated index spread across two machines. Once a week, i delete the entire index and create it from scratch. Today i am using ConcurrentUpdateSolrServer in solrj to add documents to the index. I want to add documents through both the servers.. to utilise the resources... i read in wiki (i think) that LBHttpSolrServer should not be used for indexing documents. Is there any other way to send request to both the servers without using any external load balancers? I am using Solr 4.1. ./zahoor
Re: SolrCloud and hardcoded 'id' field
On 2/11/2013 7:47 PM, Mark Miller wrote: Doesn't sound right to me. I'd guess you heard wrong. I did a search for id with the quotes throughout the branch_4x source code. After excluding test code, test files, and other things that looked like they have good reason to be hardcoded, I was left with the following: This class looks like a definite problem. org.apache.solr.common.cloud.HashBasedRouter Object idObj = sdoc.getFieldValue(id); // blech This class uses id in a way that looks bad to me, but it's just for logging: org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer I can't tell if use of id in these classes is problematic. org.apache.solr.handler.admin.LukeRequestHandler org.apache.solr.handler.admin.QueryElevationComponent org.apache.solr.handler.component.RealTimeGetComponent org.apache.solr.handler.loader.JsonLoader org.apache.solr.handler.loader.XMLLoader org.apache.solr.spelling.SpellCheckCollator These classes use id in a way that does not look problematic to me, but should be reviewed. org.apache.solr.cloud.ElectionContext org.apache.solr.cloud.Overseer org.apache.solr.cloud.OverseerCollectionProcessor org.apache.solr.core.JmxMonitoredMap org.apache.solr.handler.admin.ThreadDumpHandler Thanks, Shawn
Re: SolrCloud and hardcoded 'id' field
On 2/12/2013 7:54 PM, Shawn Heisey wrote: On 2/11/2013 7:47 PM, Mark Miller wrote: Doesn't sound right to me. I'd guess you heard wrong. I did a search for id with the quotes throughout the branch_4x source code. After excluding test code, test files, and other things that looked like they have good reason to be hardcoded, I was left with the following: This class looks like a definite problem. org.apache.solr.common.cloud.HashBasedRouter Object idObj = sdoc.getFieldValue(id); // blech This class uses id in a way that looks bad to me, but it's just for logging: org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer I can't tell if use of id in these classes is problematic. org.apache.solr.handler.admin.LukeRequestHandler org.apache.solr.handler.admin.QueryElevationComponent org.apache.solr.handler.component.RealTimeGetComponent org.apache.solr.handler.loader.JsonLoader org.apache.solr.handler.loader.XMLLoader org.apache.solr.spelling.SpellCheckCollator These classes use id in a way that does not look problematic to me, but should be reviewed. org.apache.solr.cloud.ElectionContext org.apache.solr.cloud.Overseer org.apache.solr.cloud.OverseerCollectionProcessor org.apache.solr.core.JmxMonitoredMap org.apache.solr.handler.admin.ThreadDumpHandler Somehow I missed this one where I don't know if it's a problem: org.apache.solr.search.grouping.distributed.shardresultserializer.TopGroupsResultTransformer Thanks, Shawn
Re: Benefits of Solr over Lucene?
Lucene and Solr have an aggressive upgrade schedule.From 3 to 4 got a major rewiring, and parts are orders of magnitude faster and smaller. If you code using Lucene, you will never upgrade to newer versions. (I supported SolrLucene customers for 3 years, and nobody ever did.) Cheers, Lance I know that Solr web-enables a Lucene index, but I'm trying to figure out what other things Solr offers over Lucene. On the Solr features list it says Solr uses the Lucene search library and extends it!, but what exactly are the extensions from the list and what did Lucene give you? Also if I have an index built through Solr is there a non-HTTP way to search that index? Because solr4j essentially just makes HTTP requests correct? Some features Im particularly interested in are: Geospatial Search Highlighting Dynamic Fields Near Real-Time Indexing Multiple Search Indices Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: More Like This, finding original record
Hello, Daniel, are you looking for the original doc you used for MLT in the response? You could always and easily do this on the client side by looking at IDs of returned docs. Otis Solr ElasticSearch Support http://sematext.com/ On Feb 12, 2013 9:26 AM, Daniel Rijkhof daniel.rijk...@gmail.com wrote: I guess it's not possible, but perhaps someone knows how to do this: Do a more like this query (through the mlt handler), And find the match record within the response records (top match, should be first in list). This would then make it possible for me to compare scores... Anybody around that did this? (Modify source code perhaps?) Have Fun Daniel
Re: what do you use for testing relevance?
Hi Roman, We use our own Search Analytics service. It's free and open to anyone - see http://sematext.com/search-analytics/index.html And this post talks exactly about the topic you are asking about: http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics It includes a screenshot with MRR (Mean Reciprocal Rank) that Markus mentioned. Otis Solr ElasticSearch Support http://sematext.com/ On Feb 12, 2013 5:04 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi, I do realize this is a very broad question, but still I need to ask it. Suppose you make a change into the scoring formula. How do you test/know/see what impact it had? Any framework out there? It seems like people are writing their own tools to measure relevancy. Thanks for any pointers, roman
Re: How to limit queries to specific IDs
Thank you, Erick! Three great answers! On Wed, Feb 13, 2013 at 4:20 AM, Erick Erickson erickerick...@gmail.comwrote: First, it may not be a problem assuming your other filter queries are more frequent. Second, the easiest way to keep these out of the filter cache would be just to include them as a MUST clause, like +(original query) +id:(1 2 3 4). Third possibility, see https://issues.apache.org/jira/browse/SOLR-2429, but the short form is: fq={!cache=false}restoffq On Mon, Feb 11, 2013 at 2:41 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi everyone. I have queries that should be bounded to a set of IDs (the uniqueKey field of my schema). My client front-end sends two Solr request: In the first one, it wants to get the top X IDs. This result should return very fast. No time to waste on highlighting. this is a very standard query. In the aecond one, it wants to get the highlighting info (corresponding to the queried fields and terms, of course), on those documents (may be some sequential requests, on small bulks of the full list). These two requests are implemented as almost identical calls, to different requestHandlers. I thought to append a filter query to the second request, id:(1 2 3 4 5). Is this idea good for Solr? If does, my problem is that I don't want these filters to flood my filterCache... Is there any way (even if it involves some coding...) to add a filter query which won't be added to filterCache (at least, not instead of standard filters)? Notes: 1. It can't be assured that the the first query will remain in queryResultsCache... 2. consider index size of 50M documents...
Re: LoadBalancing while adding documents
On 13-Feb-2013, at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote: Hold on here. LBHttpSolrServer should not be used for indexing in a Master/Slave setup, but in SolrCloud you may use it. Indeed, CloudSolrServer uses LBHttpSolrServer under the covers. In SolrCloud mode, ConcurrentUpdateSolrServer will already do the LoadBalacing while adding and querying documents from Solr. is my understanding right? Now, why would you want to send requests to both servers? I just wanted to send some docs to machine1 and some docs to machine2 to load balance. Not the same doc to both the machines. If you're in master/slave mode (i.e. not running Zookeeper), you _must_ send the update to the right master. If you're in SolrCloud mode, you don't care. You have to send each document to Solr only once. In Master/Slave mode, you must send it to the correct master. In SolrCloud mode you don't care where you send it, it'll be routed to the right place. I am in SolrCloud mode. I always send it to one of the server. And if i get you right, they will automatically loadBalance is what i take. ./Zahoor
Re: what do you use for testing relevance?
Hi Roman, If you're looking for regression testing then https://github.com/sul-dlss/rspec-solr might be worth looking at. If you're not a ruby shop, doing something similar in another language shouldn't be to hard. The basic idea is that you setup a set of tests like If the query is X, then the document with id Y should be in the first 10 results If the query is S, then a document with title T should be the first result If the query is P, then a document with author Q should not be in the first 10 result and that you run these whenever you tune your scoring formula to ensure that you haven't introduced unintended effects. New ideas/requirements for your relevance ranking should always result in writing new tests - that will probably fail until you tune your scoring formula. This is certainly no magic bullet, but it will give you some confidence that you didn't make things worse. And - in my humble opinion - it also gives you the benefit of discouraging you from tuning your scoring just for fun. To put it bluntly: if you cannot write up a requirement in form of a test, you probably have no need to tune your scoring. Regards, -- Steffen On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote: Hi, I do realize this is a very broad question, but still I need to ask it. Suppose you make a change into the scoring formula. How do you test/know/see what impact it had? Any framework out there? It seems like people are writing their own tools to measure relevancy. Thanks for any pointers, roman
Re: LoadBalancing while adding documents
Ooh.. I dint know that there is CloudSolrServer. Thanks for the pointer. Will explore that. ./zahoor On 13-Feb-2013, at 11:49 AM, J Mohamed Zahoor zah...@indix.com wrote: On 13-Feb-2013, at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote: Hold on here. LBHttpSolrServer should not be used for indexing in a Master/Slave setup, but in SolrCloud you may use it. Indeed, CloudSolrServer uses LBHttpSolrServer under the covers. In SolrCloud mode, ConcurrentUpdateSolrServer will already do the LoadBalacing while adding and querying documents from Solr. is my understanding right? Now, why would you want to send requests to both servers? I just wanted to send some docs to machine1 and some docs to machine2 to load balance. Not the same doc to both the machines. If you're in master/slave mode (i.e. not running Zookeeper), you _must_ send the update to the right master. If you're in SolrCloud mode, you don't care. You have to send each document to Solr only once. In Master/Slave mode, you must send it to the correct master. In SolrCloud mode you don't care where you send it, it'll be routed to the right place. I am in SolrCloud mode. I always send it to one of the server. And if i get you right, they will automatically loadBalance is what i take. ./Zahoor