Re: Multiple cores or not?
Hi, The architecture is probably better served depending on the content and data sources. If you have multiple data sources for each of the sites.. e.g database serving up site content, Feeds serving up syndicated content , then multicore will be better served using each core by data source core1- database core2 - RSS As with multiple sites, you can run multiple solr indexes with each index serving a website, Alternatively if you have common content across the sites, then you could use one of the cores to serve common content with other cores service site specific content. Hope this info helps. Regards, Ravi From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Thu, July 15, 2010 4:56:24 AM Subject: Re: Multiple cores or not? Hello there, I'm guessing the sites will be searched separately. In that case I'd recommend a core for each site. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: scr...@asia.com scr...@asia.com To: solr-user@lucene.apache.org Sent: Wed, July 14, 2010 3:02:36 PM Subject: Multiple cores or not? Hi, We are planning to host on same server different website that will use solr. What will be the best? One core with a field i schema: site1, site2 etc... and then add this in every query Or one core per site? Thanks for your help
Re: Error in building Solr-Cloud (ant example)
hi mark, jayf and i are working together :) i tried to apply the patch to the trunk, but the ant tests failed... i checked out the latest trunk: svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk patched it with SOLR-1873, and put the two JARs into trunk/solr/lib ant compile in the top level trunk directory worked fine, but ant test had a few errors. the first error was: [junit] Testsuite: org.apache.solr.cloud.BasicZkTest [junit] Testcase: testBasic(org.apache.solr.cloud.BasicZkTest): Caused an ERROR [junit] maxClauseCount must be = 1 [junit] java.lang.IllegalArgumentException: maxClauseCount must be = 1 [junit] at org.apache.lucene.search.BooleanQuery.setMaxClauseCount(BooleanQuery.java:62) [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:131) [junit] at org.apache.solr.util.AbstractSolrTestCase.tearDown(AbstractSolrTestCase.java:182) [junit] at org.apache.solr.cloud.AbstractZkTestCase.tearDown(AbstractZkTestCase.java:135) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:277) [junit] after this, tests passed until there were a lot of errors with this output: [junit] - Standard Error - [junit] Jul 15, 2010 3:00:53 PM org.apache.solr.handler.SnapPuller fetchLatestIndex [junit] SEVERE: Master at: http://localhost:TEST_PORT/solr/replication is not available. Index fetch failed. Exception: Invalid uri 'http://localhost:TEST_PORT/solr/replication': invalid port number followed by a final message: [junit] SEVERE: Master at: http://localhost:57146/solr/replication is not available. Index fetch failed. Exception: Connection refused a few more tests passed... then at the end: BUILD FAILED /Users/iwatson/work/solr/trunk/build.xml:31: The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:395: The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! are these errors currently expected (i.e. issues being sorted) or does it look like i'm doing something wrong/stupid!? thanks for your help bec :) On 5 July 2010 04:34, Mark Miller markrmil...@gmail.com wrote: Hey jayf - Offhand I'm not sure why you are having these issues - last I knew, a couple people had had success with the cloud branch. Cloud has moved on from that branch really though - we probably should update the wiki about that. More important than though, that I need to get Cloud committed to trunk! I've been saying it for a while, but I'm going to make a strong effort to wrap up the final unit test issue (apparently a testing issue, not cloud issue) and get this committed for further iterations. The way to follow along with the latest work is to go to : https://issues.apache.org/jira/browse/SOLR-1873 The latest patch there should apply to recent trunk. I've scheduled a bit of time to work on getting this committed this week, fingers crossed. -- - Mark http://www.lucidimagination.com On 7/4/10 3:37 PM, jayf wrote: Hi there, I'm having a trouble installing Solr Cloud. I checked out the project, but when compiling (ant example on OSX) I get compile a error (cannot find symbol - pasted below). I also get a bunch of warnings: [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. I have tried both Java 1.5 and 1.6. Before I got to this point, I was having problems with the included ZooKeeper jar (java versioning issue) - so I had to download the source and build this. Now 'ant' gets a bit further, to the stage listed above. Any idea of the problem??? THANKS! [javac] Compiling 438 source files to /Volumes/newpart/solrcloud/cloud/build/solr [javac] /Volumes/newpart/solrcloud/cloud/src/java/org/apache/solr/cloud/ZkController.java:588: cannot find symbol [javac] symbol : method stringPropertyNames() [javac] location: class java.util.Properties [javac] for (String sprop :
RE: How to speed up solr search speed
Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott
Re: Using stored terms for faceting
Dear Hoss, I will try to clarify what I want to achieve :-) Assume I have the following three docs: id:1 description: bmx bike 123 id:2 description: bmx bike 321 id:3 description: a mountain bike If I query against *:* I want to get the facets and its document count ala: bike: 3 bmx: 2 I reached this with the following approach: I skip the noise words like 'a'. E.g. for doc 3 I will get the terms 'mountain' and 'bike'. Those two terms will then additionally indexed to a multivalued e.g. myfacet so that I can do faceting on that field. Is there a simpler approach? Regards, Peter. : is it possible to use the stored terms of a field for a faceted search? No, the only thing stored fields can be used for is document centric opterations (ie: once you have a small set of individual docIds, you can access the stored fields to return to the user, or highlight, etc...) : I mean, I don't want to get the term frequency per document as it is : shown here: : http://wiki.apache.org/solr/TermVectorComponentExampleOptions : : I want to get the frequency of the term of my special search and show : only the 10 most frequent terms and all the nice things that I can do : for faceting. i honestly have no idea what you are saying you want -- can you provide a concrete use case explaining what you mean? describe some example data and then explain what type of logic owuld happen and what type of result you'd get back? -Hoss -- http://karussell.wordpress.com/
Re:RE: How to speed up solr search speed
Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道: Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott
Re: question on wild card
thanks erick . One more question when the perfect world* is passed as search query its converted as ? perfect world what does ? mean Since i am using standard analyzer i thought stop word the is removed thanks On Thu, Jul 15, 2010 at 7:01 AM, Erick Erickson erickerick...@gmail.comwrote: The best way to understand how things are parsed is to go to the solr admin page (Full interface link?) and click the debug info box and submit your query. That'll tell you exactly what happens. Alternatively, you can put debugQuery=on on your URL... HTH Erick On Wed, Jul 14, 2010 at 8:48 AM, Mark N nipen.m...@gmail.com wrote: I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:hello world% , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%( consider by default operator is OR ) -- Nipen Mark -- Nipen Mark
Solr Best Version
Hi all, I'm going to develop a search architecture solr based and i wonder if you could suggest me which Solr version will suite best my needs. I have 10 Solr machines which use replication, sharding and multi-core ; 1 Solr server would index Documents (Xml, *Pdf*,Text ... ) on a *NFS* *v3*Filesystem (i know it's a bad practise but it has been required by the customer) while the others will search over the Index. My first idea is to use Solr 1.4.1 but i would like to know which version (1.4.1,branch 3.x, trunk ) you suggest (I need a stable version ). Thanks in advance for your help Best Regards -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: How to speed up solr search speed
How does your queries look like? Do you use faceting, highlighting, ... ? Did you try to customize the cache? Setting the HashDocSet to 0.005 of all documents improves our search speed a lot. Did you optimize the index? 500ms seems to be slow for an 'average' search. I am not an expert but without highlighting it should be faster as 100ms or at least 200ms Regards, Peter. Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott ??2010-07-15 15:24:08??Fornoville, Tom tom.fornovi...@truvo.com ?? Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott -- http://karussell.wordpress.com/
Re: Solr Best Version
we are using 1.4.0 without any major problems so far. (So, I would use 1.4.1 for the next app, just to have the latest version.) the trunk is also nice to use fuzzy search performance boosts. Peter. Hi all, I'm going to develop a search architecture solr based and i wonder if you could suggest me which Solr version will suite best my needs. I have 10 Solr machines which use replication, sharding and multi-core ; 1 Solr server would index Documents (Xml, *Pdf*,Text ... ) on a *NFS* *v3*Filesystem (i know it's a bad practise but it has been required by the customer) while the others will search over the Index. My first idea is to use Solr 1.4.1 but i would like to know which version (1.4.1,branch 3.x, trunk ) you suggest (I need a stable version ). Thanks in advance for your help Best Regards -- http://karussell.wordpress.com/
problem with storing??
Hi all, i am new to solr and i followed d wiki and got everything going right. But when i send any html/txt/pdf documents the response is as follows::: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime576/int/lst /response but when i search in the solr i dont find the result can any one tell me what to be done..?? The curl i used for the above o/p is curl ' http://localhost:8080/solr/update/extract?literal.id=doc1000commit=truefmap.content=text' -F myfi...@java.pdf regards, satya
AW: problem with storing??
Hi, did u send a commit/ at some time after adding documents? The added docs are pending to be added to the index until u finally commit then. You can see your pending added documents count in the statistics page in the admin panel cheers -Ursprüngliche Nachricht- Von: satya swaroop [mailto:sswaro...@gmail.com] Gesendet: Donnerstag, 15. Juli 2010 11:38 An: solr-user@lucene.apache.org Betreff: problem with storing?? Hi all, i am new to solr and i followed d wiki and got everything going right. But when i send any html/txt/pdf documents the response is as follows::: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime576/int/lst /response but when i search in the solr i dont find the result can any one tell me what to be done..?? The curl i used for the above o/p is curl ' http://localhost:8080/solr/update/extract?literal.id=doc1000commit=truefmap.content=text' -F myfi...@java.pdf regards, satya
Query help
Hello, I hope someone can help me with this. I have a website which is built on top of Solr and the home page is composed of 4 sections one for each type of content on the site. At the moment to populate this page, I am using 4 separate queries of the form: ?q=ContentType:Newssort=DatePublished+descstart=0rows=5 ?q=ContentType:Newssort=Analysis+descstart=0rows=5 etc. etc. Can anyone think of a way I could reduce this to one query which brings back the top 5 pieces of content for each content type by date? Thanks in advance Rupert -- Rupert Bates Software Development Manager Guardian New and Media Tel: 020 3353 3315 rupert.ba...@guardian.co.uk Please consider the environment before printing this email. -- Visit guardian.co.uk - newspaper website of the year www.guardian.co.uk www.observer.co.uk To save up to 33% when you subscribe to the Guardian and the Observer visit http://www.guardian.co.uk/subscriber The Guardian Public Services Awards 2010, in partnership with Hays Specialist Recruitment, recognise and reward outstanding performance from public, private and voluntary sector teams. To find out more and to nominate a deserving team or individual, visit http://guardian.co.uk/publicservicesawards Entries close 16 July. - This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News Media Limited A member of Guardian Media Group PLC Registered Office Number 1 Scott Place, Manchester M3 3GG Registered in England Number 908396
Re: problem with storing??
hi, i sent the commit after adding the documents. but the problem is same regards, satya
Re: Spatial Search - Best choice ?
Some more pointers to spatial search, http://www.jteam.nl/products/spatialsolrplugin.html http://code.google.com/p/spatial-search-lucene/ http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html Regards Aditya www.findbestopensource.com On Thu, Jul 15, 2010 at 3:54 PM, Saïd Radhouani r.steve@gmail.comwrote: Hi, Using Solr 1.4, I'm now working on adding spatial search options, such as distance-based sorting, Bounding-box filter, etc. To the best of my knowledge, there are three possible points we can start from: 1. The http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ 2. The gissearch.com 3. The http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources I saw that these three options have been used but didn't see any comparison between them. Is there any one out there who can recommend one option over another? Thanks, -S
Re: Spatial Search - Best choice ?
Thanks for the links, but this makes things even harder :) Do you have any recommendations for one pointer over another? Thanks, -S On Jul 15, 2010, at 1:08 PM, findbestopensource wrote: Some more pointers to spatial search, http://www.jteam.nl/products/spatialsolrplugin.html http://code.google.com/p/spatial-search-lucene/ http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html Regards Aditya www.findbestopensource.com On Thu, Jul 15, 2010 at 3:54 PM, Saïd Radhouani r.steve@gmail.comwrote: Hi, Using Solr 1.4, I'm now working on adding spatial search options, such as distance-based sorting, Bounding-box filter, etc. To the best of my knowledge, there are three possible points we can start from: 1. The http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ 2. The gissearch.com 3. The http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources I saw that these three options have been used but didn't see any comparison between them. Is there any one out there who can recommend one option over another? Thanks, -S
SOLR Search Query : Exception : Software caused connection abort
Hi, I am trying to test the SOLR search with very big query , but when i try its throwing exception: Exception : Software caused connection abort. I'm using HTTP POST and server I'm using is Tomcat. Is SOLR query has any limitations with size or length..etc?? P ls help me and let me know solution to this problem ASAP. Regards Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Search-Query-Exception-Software-caused-connection-abort-tp969331p969331.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query help
Your example though doesn't show different ContentType, it shows a different sort order. That would be difficult to achieve in one call. Sounds like your best bet is asynchronous (multi-threaded) calls if your architecture will allow for it. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-help-tp969075p969334.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error in building Solr-Cloud (ant example)
this test needs to call super.setUp() in its setUp()... On Thu, Jul 15, 2010 at 3:15 AM, Rebecca Watson bec.wat...@gmail.comwrote: hi mark, jayf and i are working together :) i tried to apply the patch to the trunk, but the ant tests failed... i checked out the latest trunk: svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk patched it with SOLR-1873, and put the two JARs into trunk/solr/lib ant compile in the top level trunk directory worked fine, but ant test had a few errors. the first error was: [junit] Testsuite: org.apache.solr.cloud.BasicZkTest [junit] Testcase: testBasic(org.apache.solr.cloud.BasicZkTest): Caused an ERROR [junit] maxClauseCount must be = 1 [junit] java.lang.IllegalArgumentException: maxClauseCount must be = 1 [junit] at org.apache.lucene.search.BooleanQuery.setMaxClauseCount(BooleanQuery.java:62) [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:131) [junit] at org.apache.solr.util.AbstractSolrTestCase.tearDown(AbstractSolrTestCase.java:182) [junit] at org.apache.solr.cloud.AbstractZkTestCase.tearDown(AbstractZkTestCase.java:135) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:277) [junit] after this, tests passed until there were a lot of errors with this output: [junit] - Standard Error - [junit] Jul 15, 2010 3:00:53 PM org.apache.solr.handler.SnapPuller fetchLatestIndex [junit] SEVERE: Master at: http://localhost:TEST_PORT/solr/replication is not available. Index fetch failed. Exception: Invalid uri 'http://localhost:TEST_PORT/solr/replication': invalid port number followed by a final message: [junit] SEVERE: Master at: http://localhost:57146/solr/replication is not available. Index fetch failed. Exception: Connection refused a few more tests passed... then at the end: BUILD FAILED /Users/iwatson/work/solr/trunk/build.xml:31: The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:395: The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! The following error occurred while executing this line: /Users/iwatson/work/solr/trunk/solr/build.xml:477: Tests failed! are these errors currently expected (i.e. issues being sorted) or does it look like i'm doing something wrong/stupid!? thanks for your help bec :) On 5 July 2010 04:34, Mark Miller markrmil...@gmail.com wrote: Hey jayf - Offhand I'm not sure why you are having these issues - last I knew, a couple people had had success with the cloud branch. Cloud has moved on from that branch really though - we probably should update the wiki about that. More important than though, that I need to get Cloud committed to trunk! I've been saying it for a while, but I'm going to make a strong effort to wrap up the final unit test issue (apparently a testing issue, not cloud issue) and get this committed for further iterations. The way to follow along with the latest work is to go to : https://issues.apache.org/jira/browse/SOLR-1873 The latest patch there should apply to recent trunk. I've scheduled a bit of time to work on getting this committed this week, fingers crossed. -- - Mark http://www.lucidimagination.com On 7/4/10 3:37 PM, jayf wrote: Hi there, I'm having a trouble installing Solr Cloud. I checked out the project, but when compiling (ant example on OSX) I get compile a error (cannot find symbol - pasted below). I also get a bunch of warnings: [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. I have tried both Java 1.5 and 1.6. Before I got to this point, I was having problems with the included ZooKeeper jar (java versioning issue) - so I had to download the source and build this. Now 'ant' gets a bit further, to the stage listed above. Any idea of the problem??? THANKS! [javac] Compiling 438 source files to /Volumes/newpart/solrcloud/cloud/build/solr [javac]
Re: Query help
Sorry, my mistake, the example should have been as follows: ?q=ContentType:Newssort=DatePublished+descstart=0rows=5 ?q=ContentType:Analysissort=DatePublished+descstart=0rows=5 Rupert On 15 July 2010 13:02, kenf_nc ken.fos...@realestate.com wrote: Your example though doesn't show different ContentType, it shows a different sort order. That would be difficult to achieve in one call. Sounds like your best bet is asynchronous (multi-threaded) calls if your architecture will allow for it. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-help-tp969075p969334.html Sent from the Solr - User mailing list archive at Nabble.com. -- Rupert Bates Software Development Manager Guardian News and Media Tel: 020 3353 3315 rupert.ba...@guardian.co.uk Please consider the environment before printing this email. -- Visit guardian.co.uk - newspaper website of the year www.guardian.co.uk www.observer.co.uk To save up to 33% when you subscribe to the Guardian and the Observer visit http://www.guardian.co.uk/subscriber The Guardian Public Services Awards 2010, in partnership with Hays Specialist Recruitment, recognise and reward outstanding performance from public, private and voluntary sector teams. To find out more and to nominate a deserving team or individual, visit http://guardian.co.uk/publicservicesawards Entries close 16 July. - This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News Media Limited A member of Guardian Media Group PLC Registered Office Number 1 Scott Place, Manchester M3 3GG Registered in England Number 908396
SOLR Search Query : Exception : Software caused connection abort
Hi, I am trying to test the SOLR search with very big query , but when i try its throwing exception: Exception : Software caused connection abort. I'm using HTTP POST and server I'm using is Tomcat. Is SOLR query has any limitations with size or length..etc?? P ls help me and let me know solution to this problem ASAP. Regards Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Search-Query-Exception-Software-caused-connection-abort-tp969444p969444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to find first document for the ALL search
Hi, The good news is that: /solr/select?q=*%3A*fq=start=1rows=1fl=id did work (kind of odd really) so I am reading all the documents from the bad one to a new solr using using the same configuration using ruby (complete rebuild). so far so good - it is gone through 500k out of 1.7M and seems to be the best I could think of. Running the luke tool and trying to check the index on a copy ended up destroying the index and leaving only about 5k documents left. Reading them out via ruby seemed better in this case (and less work than restoring from backup and re running a few days transactions to catch it up). Ian. On Wed, Jul 14, 2010 at 9:22 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have found that this search crashes: : : /solr/select?q=*%3A*fq=start=0rows=1fl=id Ouch .. that exception is kind of hairy. it suggests that your index may have been corrupted in some way -- do you have nay idea what happened? have you tried using hte CheckIndex tool to see what it says? (I'd hate to help you workd arround this but get bit by a timebomb of some other bad docs later) : It looks like just that first document is bad. I am happy to delete it - but : not sure how to get to it. Does anyone know how to find it? CheckIndexes might help ... if it doesn't the next thing you might try is asking for a legitimate field name that you know no document has (ie: if you have a dynamicField with the pattern str_* because you have fields like str_foo and str_bar but you never have fields named strBOGUS then use fl=strBOGUS) and then add debugQuery=true to the URL -- the debug info should contain the id. I'll be honest thought: i'm guessing that if your example query doesn't work, by suggestion won't either -- because if you get that error just trying to access the id field, the same thing will probably happen when the debugComponent tries to look at up as well. -Hoss -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
London open source search meet-up
Hi all, Apologies for the cross-post. We are organising another open source search social evening in London on Wednesday the 28 July. As usual the plan is to get together and chat about search technology, from Lucene to Solr, Hadoop, Mahout, Xapian and the like - bringing together people from across the field to discuss ideas and ask questions over a quiet drink. For directions to this meetup and for the Meetup.com group see: http://www.meetup.com/london-search-social/ Please RSVP directly or via Meetup if you can make it! -- Richard Marr René Kriegler
Custom comparator
Hi, I have a requirement to have a custom comparator that keep the top N documents (chosen by some criteria) but only if their score is more then e.g. 1% of the maxScore. Looking at SolrIndexSearcher.java, I was hoping to have a custom TopFieldCollector.java to return these via TopFieldCollector..topDocs, but I can't see how to override that class to provide my own, I think I need to do this here (TopFieldCollector..topDocs) as I won't know what the maxScore is until all the docs have been collected and compared? Does anyone have any suggestions? I'd like to avoid having to do two searches. Many Thanks, Dan
Nested Function Query Syntax
Hello, I am trying to use function nested query syntax with solr 1.4.1, but I am not sure if I am doing in right way: I try this query and I get all documents which score is 12 http://localhost:8983/solr/articles.0/select/?q={!func}product(3,4)fl=Document.title,score,debugQuery=on Using the same syntax, for nested query, I get an error http://localhost:8983/solr/articles.0/select/?q={!func}query(hello)fl=Document.title,score,debugQuery=on org.apache.lucene.queryParser.ParseException: Nested function query must use $param or {!v=value} forms. got 'query(hello)' Thanks, Rodrigo.
Tag generation
A colleague mentioned that he knew of services where you pass some content and it spits out some suggested Tags or Keywords that would be best suited to associate with that content. Does anyone know if there is a contrib to Solr or Lucene that does something like this? Or a third party tool that can be given a solr index or solr query and it comes up with some good Tag suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Tag-generation-tp969888p969888.html Sent from the Solr - User mailing list archive at Nabble.com.
Re:Re: How to speed up solr search speed
Hi. Peter. I think I am not using faceting, highlighting ... I read about them but don't know how to work with them. I am using the default example just change the indexed fields. For my case, I don't think solr can work as fast as 100-200ms on average. I tried some keywords on only single solr instance. It sometimes takes more than 20s. I just input 4 keywords. I agree it is keyword concerns. But the issue is it doesn't work consistently. When 37 instances on same server works at same time (when a distributed search start), it goes worse, I saw some solr cores execute very fast, 0ms, ~40ms, ~200ms. But more solr cores executed as ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr cores need more than 17s. I have 70 cores running. And the search speed depends on the SLOWEST one. Even 69 cores can run at 1ms. but last one need 50s. then the distributed search speed is 50s. I am aware these cores on same server are interfering with each other. As I have lots of free memory. I want to know, with the prerequisite, can solr use more memory to avoid disk operation conflicts? Thanks. Regards. Scott 在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道: How does your queries look like? Do you use faceting, highlighting, ... ? Did you try to customize the cache? Setting the HashDocSet to 0.005 of all documents improves our search speed a lot. Did you optimize the index? 500ms seems to be slow for an 'average' search. I am not an expert but without highlighting it should be faster as 100ms or at least 200ms Regards, Peter. Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道: Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott -- http://karussell.wordpress.com/
Re: Tag generation
Am 15.07.2010 um 17:34 schrieb kenf_nc: A colleague mentioned that he knew of services where you pass some content and it spits out some suggested Tags or Keywords that would be best suited to associate with that content. Does anyone know if there is a contrib to Solr or Lucene that does something like this? Or a third party tool that can be given a solr index or solr query and it comes up with some good Tag suggestions? Hi there something from http://www.zemanta.com/ and something from basis tech http://www.basistech.com/ i am not sure if this would help. you could have a look at http://uima.apache.org/ greetings, olivier -- Olivier Dobberkau
Re:Re: How to speed up solr search speed
Hi. Peter. And I checked my example/solr/conf/solrconfig.xml. (solr 1.4) I don't see HashDocSet maxSize=3000 loadFactor=0.75/ in it. But I see it in solr website's solrconfig.xml wiki. So should I add it or the default(without it ) is ok? Thanks 在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道: How does your queries look like? Do you use faceting, highlighting, ... ? Did you try to customize the cache? Setting the HashDocSet to 0.005 of all documents improves our search speed a lot. Did you optimize the index? 500ms seems to be slow for an 'average' search. I am not an expert but without highlighting it should be faster as 100ms or at least 200ms Regards, Peter. Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道: Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott -- http://karussell.wordpress.com/
Re: Nested Function Query Syntax
I solved the problem. The correct syntax is: http://localhost:8983/solr/articles.0/select/?q={!func}query({!query v='hello'})fl=Document.title,score,debugQuery=on Rodrigo On Thu, Jul 15, 2010 at 12:32 PM, Rodrigo Rezende rcreze...@gmail.com wrote: Hello, I am trying to use function nested query syntax with solr 1.4.1, but I am not sure if I am doing in right way: I try this query and I get all documents which score is 12 http://localhost:8983/solr/articles.0/select/?q={!func}product(3,4)fl=Document.title,score,debugQuery=on Using the same syntax, for nested query, I get an error http://localhost:8983/solr/articles.0/select/?q={!func}query(hello)fl=Document.title,score,debugQuery=on org.apache.lucene.queryParser.ParseException: Nested function query must use $param or {!v=value} forms. got 'query(hello)' Thanks, Rodrigo.
Re: Tag generation
Check out OpenCalais [1]. Maybe it works for your case and language. [1]: http://www.opencalais.com/ On Thursday 15 July 2010 17:34:31 kenf_nc wrote: A colleague mentioned that he knew of services where you pass some content and it spits out some suggested Tags or Keywords that would be best suited to associate with that content. Does anyone know if there is a contrib to Solr or Lucene that does something like this? Or a third party tool that can be given a solr index or solr query and it comes up with some good Tag suggestions? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nested Function Query Syntax
On Thu, Jul 15, 2010 at 11:51 AM, Rodrigo Rezende rcreze...@gmail.com wrote: I solved the problem. The correct syntax is: http://localhost:8983/solr/articles.0/select/?q={!func}query({!query v='hello'})fl=Document.title,score,debugQuery=on query() causes a new QParser to be created. so does {!query}... so using both is redundant. If you want an embedded lucene query, then you could do query({!lucene v='hello'}) OR query({!lucene v=$qq})qq=hello OR, since lucene is the default query type query($qq)qq=hello -Yonik http://www.lucidimagination.com
Re: Tag generation
Hi all, in UIMA there are two components which wrap OpenCalais [1] and AlchemyAPI [2][3] services that you could use, then you could also add something else to the tagging pipeline (using existing stuff [4] or implementing your own logic). Hope this helps. Tommaso [1] : http://uima.apache.org/sandbox.html#opencalais.annotator [2] : http://www.alchemyapi.com [3] : http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator [4] : http://uima.apache.org/sandbox.html 2010/7/15 Markus Jelsma markus.jel...@buyways.nl Check out OpenCalais [1]. Maybe it works for your case and language. [1]: http://www.opencalais.com/ On Thursday 15 July 2010 17:34:31 kenf_nc wrote: A colleague mentioned that he knew of services where you pass some content and it spits out some suggested Tags or Keywords that would be best suited to associate with that content. Does anyone know if there is a contrib to Solr or Lucene that does something like this? Or a third party tool that can be given a solr index or solr query and it comes up with some good Tag suggestions? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
How to get search results taking into account ortographies errors ???
Hi every body I am working with apache solr and django with spanish documents and I would want when a user make a search and forget to accent the words the search results show both posibilities: the results without the accent an the results with the accent. would you help me please ??? Regards Ariel
Re: Nested Function Query Syntax
Yeah, it is redundant, but I am using that to use the solr query response as input of a plugin function: http://localhost:8983/solr/articles.0/select/?q={!func}myFunction(query({!query v='the query string here'})) So in myFunction I can take the query results, with the score, and write my custom sort/re-scorer. Is that the best way? On Thu, Jul 15, 2010 at 1:16 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 15, 2010 at 11:51 AM, Rodrigo Rezende rcreze...@gmail.com wrote: I solved the problem. The correct syntax is: http://localhost:8983/solr/articles.0/select/?q={!func}query({!query v='hello'})fl=Document.title,score,debugQuery=on query() causes a new QParser to be created. so does {!query}... so using both is redundant. If you want an embedded lucene query, then you could do query({!lucene v='hello'}) OR query({!lucene v=$qq}) qq=hello OR, since lucene is the default query type query($qq) qq=hello -Yonik http://www.lucidimagination.com
Re: Nested Function Query Syntax
On Thu, Jul 15, 2010 at 12:49 PM, Rodrigo Rezende rcreze...@gmail.com wrote: Yeah, it is redundant, but I am using that to use the solr query response as input of a plugin function: http://localhost:8983/solr/articles.0/select/?q={!func}myFunction(query({!query v='the query string here'})) This might be easier (requires no escaping of the query string): http://localhost:8983/solr/articles.0/select/?q={!func}myFunction(query($qq))qq=the query string here -Yonik http://www.lucidimagination.com
Re: Nested Function Query Syntax
Thank you, that works fine! On Thu, Jul 15, 2010 at 2:01 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 15, 2010 at 12:49 PM, Rodrigo Rezende rcreze...@gmail.com wrote: Yeah, it is redundant, but I am using that to use the solr query response as input of a plugin function: http://localhost:8983/solr/articles.0/select/?q={!func}myFunction(query({!query v='the query string here'})) This might be easier (requires no escaping of the query string): http://localhost:8983/solr/articles.0/select/?q={!func}myFunction(query($qq))qq=the query string here -Yonik http://www.lucidimagination.com
Custom re-rank
Hello, I am doing a rerank function of solr results using Solr Function query Plugin: http://localhost:8983/solr/articles.0/select/?q={!func}myReRank(query($qq))qq=query here Inside myReRank plugin I do the things. First, is that the best way to do that? If so, Is it possible to limit myReRank to be called only when there is a match? (when it is used {!func} the method is always called) Thanks, Rodrigo
Re: how to eliminating scoring from a query?
By specifying a sort that doesn't include score. I think it's just automatic then. It wouldn't make sense to eliminate scoring *without* sorting by some other field , you'd essentially get a random ordering. Best Erick On Thu, Jul 15, 2010 at 1:43 AM, oferiko ofer...@gmail.com wrote: in http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf under the performance it mentions: Queries that don’t sort by score can eliminate scoring, which speeds up queries how exactly can i do that? If i don't mention which sort i want, it automatically sorts by score desc. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get search results taking into account ortographies errors ???
I think you want to look at using solr.ASCIIFoldingFilterFactory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory On Thu, Jul 15, 2010 at 12:43 PM, Ariel isaacr...@gmail.com wrote: Hi every body I am working with apache solr and django with spanish documents and I would want when a user make a search and forget to accent the words the search results show both posibilities: the results without the accent an the results with the accent. would you help me please ??? Regards Ariel -- Robert Muir rcm...@gmail.com
Re: how to eliminating scoring from a query?
thanks, i want it to be the indexing order, but with a limit, something like everything that matches my query, and was indexed since yesterday, in an ascending order. Ofer On Thu, Jul 15, 2010 at 8:25 PM, Erick Erickson [via Lucene] ml-node+970139-889457701-316...@n3.nabble.comml-node%2b970139-889457701-316...@n3.nabble.com wrote: By specifying a sort that doesn't include score. I think it's just automatic then. It wouldn't make sense to eliminate scoring *without* sorting by some other field , you'd essentially get a random ordering. Best Erick On Thu, Jul 15, 2010 at 1:43 AM, oferiko [hidden email]http://user/SendEmail.jtp?type=nodenode=970139i=0 wrote: in http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf under the performance it mentions: Queries that don’t sort by score can eliminate scoring, which speeds up queries how exactly can i do that? If i don't mention which sort i want, it automatically sorts by score desc. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.htmlhttp://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.html?by-user=t Sent from the Solr - User mailing list archive at Nabble.com. -- View message @ http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p970139.html To unsubscribe from how to eliminating scoring from a query?, click here (link removed) =. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p970180.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to eliminating scoring from a query?
How about: 1. Create a date field to indicate indextime. 2 Use a date filter to restrict articles to today and yesterday such as myindexdate:[NOW/DAY-1DAY TO NOW/DAY+1DAY] 3. sort on that field. -Kallin Nagelberg -Original Message- From: oferiko [mailto:ofer...@gmail.com] Sent: Thursday, July 15, 2010 1:38 PM To: solr-user@lucene.apache.org Subject: Re: how to eliminating scoring from a query? thanks, i want it to be the indexing order, but with a limit, something like everything that matches my query, and was indexed since yesterday, in an ascending order. Ofer On Thu, Jul 15, 2010 at 8:25 PM, Erick Erickson [via Lucene] ml-node+970139-889457701-316...@n3.nabble.comml-node%2b970139-889457701-316...@n3.nabble.com wrote: By specifying a sort that doesn't include score. I think it's just automatic then. It wouldn't make sense to eliminate scoring *without* sorting by some other field , you'd essentially get a random ordering. Best Erick On Thu, Jul 15, 2010 at 1:43 AM, oferiko [hidden email]http://user/SendEmail.jtp?type=nodenode=970139i=0 wrote: in http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf under the performance it mentions: Queries that don't sort by score can eliminate scoring, which speeds up queries how exactly can i do that? If i don't mention which sort i want, it automatically sorts by score desc. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.htmlhttp://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.html?by-user=t Sent from the Solr - User mailing list archive at Nabble.com. -- View message @ http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p970139.html To unsubscribe from how to eliminating scoring from a query?, click here (link removed) =. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p970180.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with storing??
First, look at the SOLR admin page and see if there's anything in your index. Second, examine the SOLR log files, see what comes out when you try this. You really have to provide some more details other than it didn't work for us to do more than guess, Reviewing this might help: http://wiki.apache.org/solr/UsingMailingLists http://wiki.apache.org/solr/UsingMailingListsBest Erick On Thu, Jul 15, 2010 at 5:38 AM, satya swaroop sswaro...@gmail.com wrote: Hi all, i am new to solr and i followed d wiki and got everything going right. But when i send any html/txt/pdf documents the response is as follows::: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime576/int/lst /response but when i search in the solr i dont find the result can any one tell me what to be done..?? The curl i used for the above o/p is curl ' http://localhost:8080/solr/update/extract?literal.id=doc1000commit=truefmap.content=text ' -F myfi...@java.pdf regards, satya
Re: How to speed up solr search speed
Hi Scott! I am aware these cores on same server are interfering with each other. Thats not good. Try to use only one core per CPU. With more per CPU you won't have any benefits over the single-core version, I think. can solr use more memory to avoid disk operation conflicts? Yes, only the memory you have on the machine of course. Are you using tomcat or jetty? For my case, I don't think solr can work as fast as 100-200ms on average. We have indices with a lot entries not as large as yours, but in the range of X Million. and have response times under 100ms. What about testing only one core with 5-10 Mio docs? If the response time isn't any better maybe you need a different field config or sth. different is wrong? So should I add it or the default(without it ) is ok? Without is also okay - solr uses default. With 75 Mio docs it should around 20 000 but I guess there is sth. different wrong: maybe caching or field definition. Could you post the latter one? Regards, Peter. Hi. Peter. I think I am not using faceting, highlighting ... I read about them but don't know how to work with them. I am using the default example just change the indexed fields. For my case, I don't think solr can work as fast as 100-200ms on average. I tried some keywords on only single solr instance. It sometimes takes more than 20s. I just input 4 keywords. I agree it is keyword concerns. But the issue is it doesn't work consistently. When 37 instances on same server works at same time (when a distributed search start), it goes worse, I saw some solr cores execute very fast, 0ms, ~40ms, ~200ms. But more solr cores executed as ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr cores need more than 17s. I have 70 cores running. And the search speed depends on the SLOWEST one. Even 69 cores can run at 1ms. but last one need 50s. then the distributed search speed is 50s. I am aware these cores on same server are interfering with each other. As I have lots of free memory. I want to know, with the prerequisite, can solr use more memory to avoid disk operation conflicts? Thanks. Regards. Scott 在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道: How does your queries look like? Do you use faceting, highlighting, ... ? Did you try to customize the cache? Setting the HashDocSet to 0.005 of all documents improves our search speed a lot. Did you optimize the index? 500ms seems to be slow for an 'average' search. I am not an expert but without highlighting it should be faster as 100ms or at least 200ms Regards, Peter. Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道: Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same
Re: problem with storing??
satya, just a side question: did you use dismax handler? dismax won't handle q=*:* for dismax it should be empty q= to get all docs First, look at the SOLR admin page and see if there's anything in your index. Second, examine the SOLR log files, see what comes out when you try this. You really have to provide some more details other than it didn't work for us to do more than guess, Reviewing this might help: http://wiki.apache.org/solr/UsingMailingLists http://wiki.apache.org/solr/UsingMailingListsBest Erick On Thu, Jul 15, 2010 at 5:38 AM, satya swaroop sswaro...@gmail.com wrote: Hi all, i am new to solr and i followed d wiki and got everything going right. But when i send any html/txt/pdf documents the response is as follows::: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime576/int/lst /response but when i search in the solr i dont find the result can any one tell me what to be done..?? The curl i used for the above o/p is curl ' http://localhost:8080/solr/update/extract?literal.id=doc1000commit=truefmap.content=text ' -F myfi...@java.pdf regards, satya -- http://karussell.wordpress.com/
Re: Custom comparator
Hmmm, why do you need a custom collector? You can use the form of the search that returns a TopDocs, from which you can get the max score and the array of ScoreDoc each of which has its score. So you can just let the underlying code get the top N documents, and throw out any that don't score above 1%. HTH Erick On Thu, Jul 15, 2010 at 10:02 AM, dan sutton danbsut...@gmail.com wrote: Hi, I have a requirement to have a custom comparator that keep the top N documents (chosen by some criteria) but only if their score is more then e.g. 1% of the maxScore. Looking at SolrIndexSearcher.java, I was hoping to have a custom TopFieldCollector.java to return these via TopFieldCollector..topDocs, but I can't see how to override that class to provide my own, I think I need to do this here (TopFieldCollector..topDocs) as I won't know what the maxScore is until all the docs have been collected and compared? Does anyone have any suggestions? I'd like to avoid having to do two searches. Many Thanks, Dan
range query on TrieLongField - strange result?
I see different results between SortableLongField and TrieLongField when I try same range query. This is the test data: add doc field name=idZERO/field field name=f_tl0/field field name=f_sl0/field /doc doc field name=idLong.MAX_VALUE-1000/field field name=f_tl9223372036854774807/field field name=f_sl9223372036854774807/field /doc doc field name=idLong.MAX_VALUE/field field name=f_tl9223372036854775807/field field name=f_sl9223372036854775807/field /doc /add where f_tl is TrieLongField and f_sl is SortableLongField. The test data consist of 3 docs. The first doc number is 0. The second is Long.MAX_VALUE minus 1,000. The third value is Long.MAX_VALUE. I posted the data and queried on f_sl: q=f_sl:[9223372036854775807 TO 9223372036854775807] Solr returned the third doc and I think it was correct. But I did same range query on f_tl field, Solr returned 3 docs. Am I missing something? Koji -- http://www.rondhuit.com/en/
Re: Custom comparator
How its possible to access TopDocs using solr API? Thanks, Rodrigo On Thu, Jul 15, 2010 at 8:03 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, why do you need a custom collector? You can use the form of the search that returns a TopDocs, from which you can get the max score and the array of ScoreDoc each of which has its score. So you can just let the underlying code get the top N documents, and throw out any that don't score above 1%. HTH Erick On Thu, Jul 15, 2010 at 10:02 AM, dan sutton danbsut...@gmail.com wrote: Hi, I have a requirement to have a custom comparator that keep the top N documents (chosen by some criteria) but only if their score is more then e.g. 1% of the maxScore. Looking at SolrIndexSearcher.java, I was hoping to have a custom TopFieldCollector.java to return these via TopFieldCollector..topDocs, but I can't see how to override that class to provide my own, I think I need to do this here (TopFieldCollector..topDocs) as I won't know what the maxScore is until all the docs have been collected and compared? Does anyone have any suggestions? I'd like to avoid having to do two searches. Many Thanks, Dan
Re: range query on TrieLongField - strange result?
Yikes... confirmed! Something is very wrong here. -Yonik http://www.lucidimagination.com On Thu, Jul 15, 2010 at 8:47 PM, Yonik Seeley yo...@lucidimagination.com wrote: Hmmm, I'll try and duplicate. -Yonik http://www.lucidimagination.com 2010/7/15 Koji Sekiguchi k...@r.email.ne.jp: I see different results between SortableLongField and TrieLongField when I try same range query. This is the test data: add doc field name=idZERO/field field name=f_tl0/field field name=f_sl0/field /doc doc field name=idLong.MAX_VALUE-1000/field field name=f_tl9223372036854774807/field field name=f_sl9223372036854774807/field /doc doc field name=idLong.MAX_VALUE/field field name=f_tl9223372036854775807/field field name=f_sl9223372036854775807/field /doc /add where f_tl is TrieLongField and f_sl is SortableLongField. The test data consist of 3 docs. The first doc number is 0. The second is Long.MAX_VALUE minus 1,000. The third value is Long.MAX_VALUE. I posted the data and queried on f_sl: q=f_sl:[9223372036854775807 TO 9223372036854775807] Solr returned the third doc and I think it was correct. But I did same range query on f_tl field, Solr returned 3 docs. Am I missing something? Koji -- http://www.rondhuit.com/en/
Securing Solr 1.4 in a glassfish container
Hi All, I am considering securing Solr with basic auth in glassfish using the container, by adding to web.xml and adding sun-web.xml file to the distributed WAR as below. If using SolrJ to index files, how can I provide the credentials for authentication to the http-client (or can someone point me in the direction of the right documentation to do that or that will help me make the appropriate modifications) ? Also any comment on the below is appreciated. Add this to web.xml --- login-config auth-methodBASIC/auth-method realm-nameSomeRealm/realm-name /login-config security-constraint web-resource-collection web-resource-nameAdmin Pages/web-resource-name url-pattern/admin/url-pattern url-pattern/admin/*/url-pattern http-methodGET/http-methodhttp-methodPOST/http-methodhttp-methodPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/http-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-method /web-resource-collection auth-constraint role-nameSomeAdminRole/role-name /auth-constraint /security-constraint security-constraint web-resource-collection web-resource-nameUpdate Servlet/web-resource-name url-pattern/update/*/url-pattern http-methodGET/http-methodhttp-methodPOST/http-methodhttp-methodPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/http-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-method /web-resource-collection auth-constraint role-nameSomeUpdateRole/role-name /auth-constraint /security-constraint security-constraint web-resource-collection web-resource-nameSelect Servlet/web-resource-name url-pattern/select/*/url-pattern http-methodGET/http-methodhttp-methodPOST/http-methodhttp-methodPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/http-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-method /web-resource-collection auth-constraint role-nameSomeSearchRole/role-name /auth-constraint /security-constraint --- Also add this as sun-web.xml ?xml version=1.0 encoding=UTF-8? !DOCTYPE sun-web-app PUBLIC -//Sun Microsystems, Inc.//DTD Application Server 9.0 Servlet 2.5//EN http://www.sun.com/software/appserver/dtds/sun-web-app_2_5-0.dtd; sun-web-app error-url= context-root/Solr/context-root jsp-config property name=keepgenerated value=true descriptionKeep a copy of the generated servlet class' java code./description /property /jsp-config security-role-mapping role-nameSomeAdminRole/role-name group-nameSomeAdminGroup/group-name /security-role-mapping security-role-mapping role-nameSomeUpdateRole/role-name group-nameSomeUpdateGroup/group-name /security-role-mapping security-role-mapping role-nameSomeSearchRole/role-name group-nameSomeSearchGroup/group-name /security-role-mapping /sun-web-app -- -Jon - SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. -
Re: Spatial Search - Best choice ?
: Subject: Spatial Search - Best choice ? : In-Reply-To: aanlktini0kos5qgoiqnnrqd4ysdcsukt-eoam6s1q...@mail.gmail.com : References: aanlktini0kos5qgoiqnnrqd4ysdcsukt-eoam6s1q...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Securing Solr 1.4 in a glassfish container AS NEW THREAD
Hi All, I am considering securing Solr with basic auth in glassfish using the container, by adding to web.xml and adding sun-web.xml file to the distributed WAR as below. If using SolrJ to index files, how can I provide the credentials for authentication to the http-client (or can someone point me in the direction of the right documentation to do that or that will help me make the appropriate modifications) ? Also any comment on the below is appreciated. Add this to web.xml --- login-config auth-methodBASIC/auth-method realm-nameSomeRealm/realm-name /login-config security-constraint web-resource-collection web-resource-nameAdmin Pages/web-resource-name url-pattern/admin/url-pattern url-pattern/admin/*/url-pattern http-methodGET/http-methodhttp-methodPOST/http- methodhttp-methodPUT/http-methodhttp-methodTRACE/http- methodhttp-methodHEAD/http-methodhttp-methodOPTIONS/http- methodhttp-methodDELETE/http-method /web-resource-collection auth-constraint role-nameSomeAdminRole/role-name /auth-constraint /security-constraint security-constraint web-resource-collection web-resource-nameUpdate Servlet/web-resource-name url-pattern/update/*/url-pattern http-methodGET/http-methodhttp-methodPOST/http- methodhttp-methodPUT/http-methodhttp-methodTRACE/http- methodhttp-methodHEAD/http-methodhttp-methodOPTIONS/http- methodhttp-methodDELETE/http-method /web-resource-collection auth-constraint role-nameSomeUpdateRole/role-name /auth-constraint /security-constraint security-constraint web-resource-collection web-resource-nameSelect Servlet/web-resource-name url-pattern/select/*/url-pattern http-methodGET/http-methodhttp-methodPOST/http- methodhttp-methodPUT/http-methodhttp-methodTRACE/http- methodhttp-methodHEAD/http-methodhttp-methodOPTIONS/http- methodhttp-methodDELETE/http-method /web-resource-collection auth-constraint role-nameSomeSearchRole/role-name /auth-constraint /security-constraint --- Also add this as sun-web.xml ?xml version=1.0 encoding=UTF-8? !DOCTYPE sun-web-app PUBLIC -//Sun Microsystems, Inc.//DTD Application Server 9.0 Servlet 2.5//EN http://www.sun.com/software/appserver/dtds/sun-web-app_2_5-0.dtd sun-web-app error-url= context-root/Solr/context-root jsp-config property name=keepgenerated value=true descriptionKeep a copy of the generated servlet class' java code./description /property /jsp-config security-role-mapping role-nameSomeAdminRole/role-name group-nameSomeAdminGroup/group-name /security-role-mapping security-role-mapping role-nameSomeUpdateRole/role-name group-nameSomeUpdateGroup/group-name /security-role-mapping security-role-mapping role-nameSomeSearchRole/role-name group-nameSomeSearchGroup/group-name /security-role-mapping /sun-web-app -- -Jon - SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wis h to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. -
Novice seeking help to change filters to search without diacritics
I am new to Solr and seeking your help to change filter from ISOLatin1AccentFilterFactory to ASCIIFoldingFilterFactory files. I am not sure what change is to be made and where exactly this change is to be made. And finally, what would replace mapping-ISOLatin1Accent.txt file? I would like Solr to search both with and without diacritics found in transliteration of Indian languages with characters such as Ā ś ṛ ṇ, etc. The three files that are currently being used are attached. I would deeply appreciate your help. Thank you. http://lucene.472066.n3.nabble.com/file/n971263/schema.xml schema.xml http://lucene.472066.n3.nabble.com/file/n971263/solrconfig.xml solrconfig.xml http://lucene.472066.n3.nabble.com/file/n971263/mapping-ISOLatin1Accent.txt mapping-ISOLatin1Accent.txt -- View this message in context: http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search-without-diacritics-tp971263p971263.html Sent from the Solr - User mailing list archive at Nabble.com.
Timeout in distributed search
Hi. All. Is there anyway to have time out support in distributed search. I searched https://issues.apache.org/jira/browse/SOLR-502 but looks it is not in main release of solr1.4 I have 70 cores, when I search, some response in 0-700ms. Some return in about 2s. Some need very long time, more than 15s. If we can add timeout support, eg: 2s. Then each time we search, we can have the result returned within 2s. and the response time is acceptable for our user. And as the shards already receive the query request, it will do query and cache the result. then next time we search, it will become fast. Anyone has the solution? Regards. Scott
no response
Hi all, i Have a problem with the solr. when i send the documents(.doc) i am not getting the response. example: sa...@geodesic-desktop:~/Desktop$ curl http://localhost:8080/solr/update/extract?stream.file=/home/satya/Desktop/InvestmentDecleration.docstream.contentType=application/msword; literal.id=Invest.doc sa...@geodesic-desktop:~/Desktop$ could any body tell me what to do??