Solr 4.2 rollback not working
Hi All, WE have setup a 4.2 Solr cloud with 4 nodes and while the add/update/delete operations are working we are not able to perform a rollback. Is there something different for this operation vs the 3.x sole master/slave config? Thanks, Dipti phone: 408.678.1595 | cell: 408.806.1970 | email: dipti.srivast...@apollogrp.edumailto:dipti.srivast...@apollogrp.edu Solutions Engineering and Integration: https://wiki.apollogrp.edu/display/NGP/Solutions+Engineering+and+Integration Support process: https://wiki.apollogrp.edu/display/NGP/Classroom+Services+Support P Please consider the environment before printing this email. This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
commit in solr4 takes a longer time
Hi all, I have recently migrated from solr 3.6 to solr 4.0. The documents in my core are getting constantly updated and so I fire a code commit after every 10 thousand docs . However moving from 3.6 to 4.0 I have noticed that for the same core size it takes about twice the time to commit in solr4.0 compared to solr 3.6. Is there any workaround by which I can reduce this time. Any help would be highly appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit in solr4 takes a longer time
Can you explain more about your document size, shard and replica sizes, and auto/soft commit time parameters? 2013/5/2 vicky desai vicky.de...@germinait.com Hi all, I have recently migrated from solr 3.6 to solr 4.0. The documents in my core are getting constantly updated and so I fire a code commit after every 10 thousand docs . However moving from 3.6 to 4.0 I have noticed that for the same core size it takes about twice the time to commit in solr4.0 compared to solr 3.6. Is there any workaround by which I can reduce this time. Any help would be highly appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrJ / Solr Two Phase Commit
I am wondering if it was possible to achieve SolrJ/Solr Two Phase Commit. Any examples? Any best practices? What I know: * Lucene offers Two Phase Commitvia it's index writer (prepareCommit() followed by either commit() or rollback()). http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html * I know Solr Optimistic Concurrency is available. http://yonik.com/solr/optimistic-concurrency/ http://yonik.com/solr/optimistic-concurrency/ I want a transactional behavior that ensures that there is a full commit or full rollback of multiple documents. I do not want to be in a situation where I don't know if the beans have been written or not written to the Solr instance. * Code Snippet try { UpdateResponse updateResponse = server.add(Arrays.asList(docOne, docTwo)); successForAddingDocuments = (updateResponse.getStatus() == 0); if (successForAddingDocuments) { UpdateResponse updateResponseForCommit = server.commit(); successForCommit = (updateResponseForCommit.getStatus() == 0); } } catch (Exception e) { } finally { if (!successForCommit) { System.err.println(Rolling back transaction.); try { UpdateResponse updateResponseForRollback = server.rollback(); if (updateResponseForRollback.getStatus() == 0) { successForRollback = true; } else { successForRollback = false; System.err.println(Failed to rollback! Bad as state is now unknown!); } } catch (Exception e) { } } } * Full Test class @Test public void documentTransactionTest() { try { // HttpSolrServer server = ... server.deleteById(Arrays.asList(1, 2)); server.commit(); } catch (Exception e) { e.printStackTrace(); } SolrInputDocument docOne = new SolrInputDocument(); { docOne.addField(id, 1L); docOne.addField(type_s, MyTestDoc); docOne.addField(value_s, docOne); docOne.addField(_version_, -1L); } SolrInputDocument docTwo = new SolrInputDocument(); { docTwo.addField(id, 2L); docTwo.addField(type_s, MyTestDoc); docTwo.addField(value_s, docTwo); docTwo.addField(_version_, -1L); } boolean successForAddingDocuments = false; boolean successForCommit = false; boolean successForRollback = false; //throw new SolrServerException(Connection Broken); try { UpdateResponse updateResponse = server.add(Arrays.asList(docOne, docTwo)); successForAddingDocuments = (updateResponse.getStatus() == 0); if (successForAddingDocuments) { UpdateResponse updateResponseForCommit = server.commit(); successForCommit = (updateResponseForCommit.getStatus() == 0); } } catch (Exception e) { } finally { if (!successForCommit) { System.err.println(Rolling back transaction.); try { UpdateResponse updateResponseForRollback = server.rollback(); if (updateResponseForRollback.getStatus() == 0) { successForRollback = true; } else { successForRollback = false; System.err.println(Failed to rollback! Bad as state is now unknown!); } } catch (Exception e) { } } } { try { QueryResponse response = server.query(new SolrQuery(*:*).addFilterQuery(type_s:MyTestDoc)); if (successForCommit) { Assert.assertEquals(2, response.getResults().size()); Assert.assertEquals(docOne, response.getResults().get(0).get(value_s)); Assert.assertEquals(docTwo, response.getResults().get(1).get(value_s)); } else if (successForRollback) { Assert.assertEquals(0, response.getResults().size()); } else { // UNKNOWN STATE if (response.getResults().size() == 0) { // rollback must have been successful } else if (response.getResults().size() == 2) { // commit was successful
Re: Solr 4.2 rollback not working
What version of Solr are you using? 4.2.0 or 4.2.1? The following might be of interest to you: * https://issues.apache.org/jira/browse/SOLR-4605 https://issues.apache.org/jira/browse/SOLR-4605 * https://issues.apache.org/jira/browse/SOLR-4733 https://issues.apache.org/jira/browse/SOLR-4733 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4060401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit in solr4 takes a longer time
Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any plans to remove int32 limitation on the number of the documents in the index?
The Integer.MAX_VALUE-1 limit is set by Lucene. As hardware capacity and performance continues to advance, I think it's only a matter of time before Lucene (and then Solr) relaxes the limit, but I don't imagine it will happened real soon. Maybe in Lucene/Solr 6.0? -- Jack Krupansky -Original Message- From: Valery Giner Sent: Wednesday, May 01, 2013 1:36 PM To: solr-user@lucene.apache.org Subject: any plans to remove int32 limitation on the number of the documents in the index? Dear Solr Developers, I've been unable to find an answer to the question in the subject line of this e-mail, except of a vague one. We need to be able to index over 2bln+ documents. We were doing well without sharding until the number of docs hit the limit ( 2bln+). The performance was satisfactory for the queries, updates and indexing of new documents. That is, except for the need to go around the int32 limit, we don't really have a need for setting up distributed solr. I wonder whether some one on the solr team could tell us when/what version of solr we could expect the limit to be removed. I hope this question may be of interest to some one else :) -- Thanks, Val
Any estimation for solr 4.3?
Hi, We're planning on upgrading our solr cluster from 4.0 to 4.2.1 Is 4.3 coming any soon? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-estimation-for-solr-4-3-tp4060408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any estimation for solr 4.3?
RC4 of 4.3 is available now. The final release of 4.3 is likely to be within days. -- Jack Krupansky -Original Message- From: adfel70 Sent: Thursday, May 02, 2013 1:32 AM To: solr-user@lucene.apache.org Subject: Any estimation for solr 4.3? Hi, We're planning on upgrading our solr cluster from 4.0 to 4.2.1 Is 4.3 coming any soon? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-estimation-for-solr-4-3-tp4060408.html Sent from the Solr - User mailing list archive at Nabble.com.
Pulling Config Folder from Zookeeper at SolrCloud
I use the same folder naming convention of Solr example for my Solr 4.2.1 cloud. I have a collection1 folder and under it I have a conf folder. When I starting up my first node, I indicate that: -Dsolr.solr.home=./solr -Dsolr.data.dir=./solr/data -DnumShards=5 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf when I use external Zookeeper ensemble. However I see that Zookeeper keeps that config file. What should I do when I startup new nodes at my SolrCloud. Should I remove conf folder from sources and -Dbootstrap_confdir=./solr/collection1/con parameter from startup parameters because of Zookeeper keeps it? Also what should I do if I have multiple configs at Zookeeper, how can I indicate my new starting up instance to use which config at Zookeeper?
Does Near Real Time get not supported at SolrCloud?
Does Near Real Time get not supported at SolrCloud? I mean when a soft commit occurs at a leader I think that it doesn't distribute it to replicas(because it is not at storage, does indexes at RAM distributes to replicas too?) and a search query comes what happens?
socket write error
Hi guys! We have solr router and shards. I see this in jetty log on the router: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset by peer: socket write error and then: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request followed by exception about Internal Server Error any ideas why this happens? We run 80+ shards distributed across several servers. Router runs on its own node. Is there anything in particular I should be looking into wrt ubuntu socket settings? Is this a known issue for solr's distributed search from the past? Thanks, Dmitry
Is indexing large documents still an issue?
Hi, In previous versions of solr, indexing documents with large fields caused performance degradation. Is this still the case in solr 4.2? If so, and I'll need to chunk the document and index many document parts, can anyony give a general idea of what field/document size solr CAN handle? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is indexing large documents still an issue?
The only issue I ran into was returning the content field. Once I modified my query to avoid that, I got good performance. Admittedly, I only have about 15-20k documents in my index ATM, but most of them are in the multiMB range with a current max of 250MB. On Thu, May 2, 2013 at 7:05 AM, adfel70 adfe...@gmail.com wrote: Hi, In previous versions of solr, indexing documents with large fields caused performance degradation. Is this still the case in solr 4.2? If so, and I'll need to chunk the document and index many document parts, can anyony give a general idea of what field/document size solr CAN handle? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is indexing large documents still an issue?
Well, returning the content field for highlighting is within my requirements. Did you solve this in some other way? or you just didn't have to? Bai Shen wrote The only issue I ran into was returning the content field. Once I modified my query to avoid that, I got good performance. Admittedly, I only have about 15-20k documents in my index ATM, but most of them are in the multiMB range with a current max of 250MB. On Thu, May 2, 2013 at 7:05 AM, adfel70 lt; adfel70@ gt; wrote: Hi, In previous versions of solr, indexing documents with large fields caused performance degradation. Is this still the case in solr 4.2? If so, and I'll need to chunk the document and index many document parts, can anyony give a general idea of what field/document size solr CAN handle? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425p4060431.html Sent from the Solr - User mailing list archive at Nabble.com.
Security for inter-solr-node requests
Here is a part from wiki: 1) Just forward credentials from the super-request which caused the inter-solr-node sub-requests 2) Use internal credentials provided to the solr-node by the administrator at startup what do you use and is there any code example for it?
Re: What Happens to Consistency if I kill a Leader and Startup it again?
The leader would not be behind replica because the old leader would not come back and take over the leader role. It would ne just a replica and it would replicate the index from whichever node is the leader. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote: I think about such situation: Let's assume that I am indexing at my SolrCloud. My leader has a version of higher than replica as well (I have one leader and one replica for each shard). If I kill leader, replica will be leader as well. When I startup old leader again it will be a replica for my shard. However I think that leader will have less document than replica and a less version than replica. Does it cause a problem because of leader is behing of replica?
Re: socket write error
After some searching around, I see this: http://search-lucene.com/m/ErEZUl7P5f2/%2522socket+write+error%2522subj=Long+list+of+shards+breaks+solrj+query Seems like this has happened in the past with large amount of shards. To make it clear: the distributed search works with 20 shards. On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi guys! We have solr router and shards. I see this in jetty log on the router: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset by peer: socket write error and then: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request followed by exception about Internal Server Error any ideas why this happens? We run 80+ shards distributed across several servers. Router runs on its own node. Is there anything in particular I should be looking into wrt ubuntu socket settings? Is this a known issue for solr's distributed search from the past? Thanks, Dmitry
RE: DF is not updated when a document is marked for deletion note
DF uses maxDoc which is updated when segments merge so DF is almost never accurate in a dynamic index. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thu 02-May-2013 14:05 To: solr-user@lucene.apache.org Subject: DF is not updated when a document is marked for deletion note When I look at here: http://localhost:8983/solr/admin/luke I see that Note: Document Frequency (df) is not updated when a document is marked for deletion. df values include deleted documents. is it something like I should care?
Re: What Happens to Consistency if I kill a Leader and Startup it again?
Thanks for the answer. This is what I try to say: time = t Node A (Leader): version is 100 Node B (Replica): version is 90 time = t+1 Node A (Killing): version is 100 and killed Node B (Replica): version is 90 time = t+2 Node A (Killed): version is 100 and killed Node B (Become Leader): version is 95 (we indexed something) time = t+3 Node A (Started as Replica): version is 100 and live Node B (Leader): version is 95 so I think that leader will behind replica. Is there anything different for such scenario? 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com The leader would not be behind replica because the old leader would not come back and take over the leader role. It would ne just a replica and it would replicate the index from whichever node is the leader. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote: I think about such situation: Let's assume that I am indexing at my SolrCloud. My leader has a version of higher than replica as well (I have one leader and one replica for each shard). If I kill leader, replica will be leader as well. When I startup old leader again it will be a replica for my shard. However I think that leader will have less document than replica and a less version than replica. Does it cause a problem because of leader is behing of replica?
Re: What Happens to Consistency if I kill a Leader and Startup it again?
If you're using zookeeper, this should not be allowed to happen (I think). On Thu, May 2, 2013 at 2:12 PM, Furkan KAMACI furkankam...@gmail.comwrote: Thanks for the answer. This is what I try to say: time = t Node A (Leader): version is 100 Node B (Replica): version is 90 time = t+1 Node A (Killing): version is 100 and killed Node B (Replica): version is 90 time = t+2 Node A (Killed): version is 100 and killed Node B (Become Leader): version is 95 (we indexed something) time = t+3 Node A (Started as Replica): version is 100 and live Node B (Leader): version is 95 so I think that leader will behind replica. Is there anything different for such scenario? 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com The leader would not be behind replica because the old leader would not come back and take over the leader role. It would ne just a replica and it would replicate the index from whichever node is the leader. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote: I think about such situation: Let's assume that I am indexing at my SolrCloud. My leader has a version of higher than replica as well (I have one leader and one replica for each shard). If I kill leader, replica will be leader as well. When I startup old leader again it will be a replica for my shard. However I think that leader will have less document than replica and a less version than replica. Does it cause a problem because of leader is behing of replica?
Re: Any estimation for solr 4.3?
On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com wrote: RC4 of 4.3 is available now. The final release of 4.3 is likely to be within days. How can I see the Changelog of what will be in it? Thanks, xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Any estimation for solr 4.3?
The road map has this release note, but I think that most of it will be move to 4.3.1 or 4.4 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128 Regards -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, May 2, 2013 at 2:56 PM, Andy Lester wrote: On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com (mailto:j...@basetechnology.com) wrote: RC4 of 4.3 is available now. The final release of 4.3 is likely to be within days. How can I see the Changelog of what will be in it? Thanks, xoa -- Andy Lester = a...@petdance.com = www.petdance.com (http://www.petdance.com) = AIM:petdance
Re: Any estimation for solr 4.3?
On May 2, 2013, at 9:03 AM, Yago Riveiro yago.rive...@gmail.com wrote: The road map has this release note, but I think that most of it will be move to 4.3.1 or 4.4 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128 So, is there a way I can see what is currently pending to go in 4.3? -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Any estimation for solr 4.3?
In attachment the change log of solr 4.3 RC3 Regards -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, May 2, 2013 at 3:06 PM, Andy Lester wrote: On May 2, 2013, at 9:03 AM, Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: The road map has this release note, but I think that most of it will be move to 4.3.1 or 4.4 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128 So, is there a way I can see what is currently pending to go in 4.3? -- Andy Lester = a...@petdance.com = www.petdance.com (http://www.petdance.com) = AIM:petdance
Re: Any estimation for solr 4.3?
On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com wrote: In attachment the change log of solr 4.3 RC3 And where would I find that? I don't see anything at http://lucene.apache.org/solr/downloads.html to download? Do I need to check out Subversion repo? Is there a page somewhere that describes the process set up? -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Any estimation for solr 4.3?
Hopefully, this is not a secret, but the RCs are built and available for download and announced on the dev mailing list. So, the changes for RC4 (not RC3 anymore) are here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC4-rev1477023/solr/changes/Changes.html Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 2, 2013 at 10:14 AM, Andy Lester a...@petdance.com wrote: On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com wrote: In attachment the change log of solr 4.3 RC3 And where would I find that? I don't see anything at http://lucene.apache.org/solr/downloads.html to download? Do I need to check out Subversion repo? Is there a page somewhere that describes the process set up? -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Any estimation for solr 4.3?
I get the RC3 zip file from the the mailing list. The 4.3 has not yet been released. Therefore you can't download it from the regular channels Regards -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, May 2, 2013 at 3:14 PM, Andy Lester wrote: On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: In attachment the change log of solr 4.3 RC3 And where would I find that? I don't see anything at http://lucene.apache.org/solr/downloads.html to download? Do I need to check out Subversion repo? Is there a page somewhere that describes the process set up? -- Andy Lester = a...@petdance.com = www.petdance.com (http://www.petdance.com) = AIM:petdance
Re: Any estimation for solr 4.3?
On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hopefully, this is not a secret, but the RCs are built and available for download and announced on the dev mailing list. Thanks for the link. I don't think it's a secret, but I sure don't see anything that says This is how the dev process works. -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: SolrJ / Solr Two Phase Commit
One thing I do know is that commits in Solr are global, so there's no way to do this with concurrency. That being said, Solr doesn't tend to accept updates that would generate errors once committed in my experience. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, May 2, 2013 at 3:06 AM, mark12345 marks1900-pos...@yahoo.com.auwrote: I am wondering if it was possible to achieve SolrJ/Solr Two Phase Commit. Any examples? Any best practices? What I know: * Lucene offers Two Phase Commitvia it's index writer (prepareCommit() followed by either commit() or rollback()). http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html * I know Solr Optimistic Concurrency is available. http://yonik.com/solr/optimistic-concurrency/ http://yonik.com/solr/optimistic-concurrency/ I want a transactional behavior that ensures that there is a full commit or full rollback of multiple documents. I do not want to be in a situation where I don't know if the beans have been written or not written to the Solr instance. * Code Snippet try { UpdateResponse updateResponse = server.add(Arrays.asList(docOne, docTwo)); successForAddingDocuments = (updateResponse.getStatus() == 0); if (successForAddingDocuments) { UpdateResponse updateResponseForCommit = server.commit(); successForCommit = (updateResponseForCommit.getStatus() == 0); } } catch (Exception e) { } finally { if (!successForCommit) { System.err.println(Rolling back transaction.); try { UpdateResponse updateResponseForRollback = server.rollback(); if (updateResponseForRollback.getStatus() == 0) { successForRollback = true; } else { successForRollback = false; System.err.println(Failed to rollback! Bad as state is now unknown!); } } catch (Exception e) { } } } * Full Test class @Test public void documentTransactionTest() { try { // HttpSolrServer server = ... server.deleteById(Arrays.asList(1, 2)); server.commit(); } catch (Exception e) { e.printStackTrace(); } SolrInputDocument docOne = new SolrInputDocument(); { docOne.addField(id, 1L); docOne.addField(type_s, MyTestDoc); docOne.addField(value_s, docOne); docOne.addField(_version_, -1L); } SolrInputDocument docTwo = new SolrInputDocument(); { docTwo.addField(id, 2L); docTwo.addField(type_s, MyTestDoc); docTwo.addField(value_s, docTwo); docTwo.addField(_version_, -1L); } boolean successForAddingDocuments = false; boolean successForCommit = false; boolean successForRollback = false; //throw new SolrServerException(Connection Broken); try { UpdateResponse updateResponse = server.add(Arrays.asList(docOne, docTwo)); successForAddingDocuments = (updateResponse.getStatus() == 0); if (successForAddingDocuments) { UpdateResponse updateResponseForCommit = server.commit(); successForCommit = (updateResponseForCommit.getStatus() == 0); } } catch (Exception e) { } finally { if (!successForCommit) { System.err.println(Rolling back transaction.); try { UpdateResponse updateResponseForRollback = server.rollback(); if (updateResponseForRollback.getStatus() == 0) { successForRollback = true; } else { successForRollback = false; System.err.println(Failed to rollback! Bad as state is now unknown!); } } catch (Exception e) { } } } { try { QueryResponse response = server.query(new SolrQuery(*:*).addFilterQuery(type_s:MyTestDoc)); if (successForCommit) { Assert.assertEquals(2, response.getResults().size()); Assert.assertEquals(docOne, response.getResults().get(0).get(value_s));
Re: Any estimation for solr 4.3?
Hi everyone, I am working on an internal project in my company that requires solr, but I could not manage to link it to Tika. I bought the apache solr 4 cookbook yet I couldn't figure out the solution. 1) I copied the required jar files into a lib directory 2) I added the lib directory in solrconfig.xml when I remove start=lazy in requesthandler=update/extract I get the following error: org.apache.solr.common.SolrException: RequestHandler init failure ... ... Caused by: org.apache.solr.common.SolrException: RequestHandler init failure ... ... Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extraction.ExtractingRequestHandler' ... ... Caused by: java.lang.ClassNotFoundException: solr.extraction.ExtractingRequestHandler I would highly apreciate any help. thank you very much 2013/5/2 Andy Lester a...@petdance.com On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hopefully, this is not a secret, but the RCs are built and available for download and announced on the dev mailing list. Thanks for the link. I don't think it's a secret, but I sure don't see anything that says This is how the dev process works. -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Any estimation for solr 4.3?
On Thu, May 2, 2013 at 10:23 AM, Andy Lester a...@petdance.com wrote: I don't think it's a secret, but I sure don't see anything that says This is how the dev process works. I suspect this is somewhere in the Apache operating charter/standard operating procedures for all projects and we (Solr users/developers) are just 4-meta-levels downstream of that. But I agree, it could make for a nice mini-page on the Wiki (if it is ever up). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: commit in solr4 takes a longer time
First, I would upgrade to 4.2.1 and remember to change luceneMatchVersion to LUCENE_42. There were a LOT of fixes between 4.0 and 4.2.1. wunder On May 2, 2013, at 12:16 AM, vicky desai wrote: Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
Re: any plans to remove int32 limitation on the number of the documents in the index?
$100 for anyone who gets me a working Long.MAX_VALUE branch! ;-) I know that for many of the SOLR with faceting use cases, things will not scale to Long documents, but there are a number of more straightforward use cases, where SOLR/Lucene will scale to Long. Like simple searches, small numbers of terms, etc. For these, getting Long in place sooner rather than later would be appreciated by some of us! :-) -glen On Thu, May 2, 2013 at 4:20 AM, Jack Krupansky j...@basetechnology.com wrote: The Integer.MAX_VALUE-1 limit is set by Lucene. As hardware capacity and performance continues to advance, I think it's only a matter of time before Lucene (and then Solr) relaxes the limit, but I don't imagine it will happened real soon. Maybe in Lucene/Solr 6.0? -- Jack Krupansky -Original Message- From: Valery Giner Sent: Wednesday, May 01, 2013 1:36 PM To: solr-user@lucene.apache.org Subject: any plans to remove int32 limitation on the number of the documents in the index? Dear Solr Developers, I've been unable to find an answer to the question in the subject line of this e-mail, except of a vague one. We need to be able to index over 2bln+ documents. We were doing well without sharding until the number of docs hit the limit ( 2bln+). The performance was satisfactory for the queries, updates and indexing of new documents. That is, except for the need to go around the int32 limit, we don't really have a need for setting up distributed solr. I wonder whether some one on the solr team could tell us when/what version of solr we could expect the limit to be removed. I hope this question may be of interest to some one else :) -- Thanks, Val -- - http://zzzoot.blogspot.com/ -
Re: commit in solr4 takes a longer time
you might want to added openSearcher=false for hard commit, so hard commit also act like soft commit autoCommit maxDocs5/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.comwrote: Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit in solr4 takes a longer time
What happens exactly when you don't open searcher at commit? 2013/5/2 Gopal Patwa gopalpa...@gmail.com you might want to added openSearcher=false for hard commit, so hard commit also act like soft commit autoCommit maxDocs5/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com wrote: Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit in solr4 takes a longer time
If you don't re-open the searcher, you will not see new changes. So, if you only have hard commit, you never see those changes (until restart). But if you also have soft commit enabled, that will re-open your searcher for you. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI furkankam...@gmail.com wrote: What happens exactly when you don't open searcher at commit? 2013/5/2 Gopal Patwa gopalpa...@gmail.com you might want to added openSearcher=false for hard commit, so hard commit also act like soft commit autoCommit maxDocs5/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com wrote: Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit in solr4 takes a longer time
Hi Vicky, I faced this issue as well and after some playing around I found the autowarm count in cache sizes to be a problem. I changed that from a fixed count (3072) to percentage (10%) and all commit times were stable then onwards. filterCache class=solr.FastLRUCache size=8192 initialSize=3072 autowarmCount=10% / queryResultCache class=solr.LRUCache size=16384 initialSize=3072 autowarmCount=10% / documentCache class=solr.LRUCache size=8192 initialSize=4096 autowarmCount=10% / HTH, Sandeep On 2 May 2013 16:31, Alexandre Rafalovitch arafa...@gmail.com wrote: If you don't re-open the searcher, you will not see new changes. So, if you only have hard commit, you never see those changes (until restart). But if you also have soft commit enabled, that will re-open your searcher for you. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI furkankam...@gmail.com wrote: What happens exactly when you don't open searcher at commit? 2013/5/2 Gopal Patwa gopalpa...@gmail.com you might want to added openSearcher=false for hard commit, so hard commit also act like soft commit autoCommit maxDocs5/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com wrote: Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com.
solr master server
Hi, I want to set up a master / slave configuration for solr 3.6 Is there a best practice for the Raid config and the Linux partitions for the master server? Cheers, Torsten
Re: How to deal with cache for facet search when index is always increment?
On Wed, May 1, 2013 at 7:01 PM, 李威 li...@antvision.cn wrote: For facet seach, solr would create cache which is based on the whole docs. If I import a new doc into index, the cache would out of time and need to create again. For real time seach, the docs would be import to index anytime. In this case, the cache is nealy always need to create again, which cause the facet seach is very slowly. Do you have any idea to deal with such problem? We're in a similar situation and have had better performance using facet.method=fcs. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method The suggestion to use soft commits is also a good one. Best regards, Daniel
Re: string field does not yield exact match result using qf parameter
Hi Jan my question is when I tweak pf and qf parameter and the results change slightly and I do not think for exact match you need to implement the solution that you mentioned in your reply. you can always have string field and in your pf parameter you can boost that field to get the exact match results on top. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Not In query
Hi Jan. Thank's again for your reply. You're right. It is almost impossible to an user exclude 200.000 documents. I'll do some tests with NOT IN query. Thank you again. * -- * *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado https://profiles.google.com/105605760943701739931 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 http://www.youtube.com/andremaldonado On Tue, Apr 30, 2013 at 6:09 PM, Jan Høydahl jan@cominvent.com wrote: Hi, How, practically would a user end up with 200.000 documents excluded? Is there some way in your application to exclude categories of documents with one click? If so, I would index those category IDs on all docs in that category, and then do fq=-cat:123 instead of adding all the individual docids. Anyway, I'd start with the simple approach and then optimize once you (perhaps, perhaps not) bump into problems. Most likely it will work like a charm :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 30. apr. 2013 kl. 16:21 skrev André Maldonado andre.maldon...@gmail.com: Thank's Jan for your reply. My application has thousands of users and I don't know yet how many of them will use this feature. They can exclude one document from their search results or can exclude 200.000 documents. It's much more natural that they exclude something like 50~300 documents. More than this will be strange. However, I don't know how cache will work because we have a large number of users who can use this feature. Even that query for user 1 be cached, it won't work for other users. Do you see another solution for this case? Thank's * -- * *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado https://profiles.google.com/105605760943701739931 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 http://www.youtube.com/andremaldonado On Fri, Apr 26, 2013 at 6:18 PM, Jan Høydahl jan@cominvent.com wrote: I would start with the way you propose, a negative filter q=foo barfq=-id:(123 729 640 112...) This will effectively hide those doc ids, and a benefit is that it is cached so if the list of ids is long, you'll only take the performance hit the first time. I don't know your application, but if it is highly likely that a single user will add excludes for several thousand ids then you should perhaps consider other options and benchmark up front. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 26. apr. 2013 kl. 21:50 skrev André Maldonado andre.maldon...@gmail.com: Hi all. We have an index with 300.000 documents and a lot, a lot of fields. We're planning a module where users will choose some documents to exclude from their search results. So, these documents will be excluded for UserA and visible for UserB. So, we have some options to do this. The simplest way is to do a Not In query in document id. But we don't know the performance impact this will have. Is this an option? There is another reasonable way to accomplish this? Thank's * -- * *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado https://profiles.google.com/105605760943701739931 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 http://www.youtube.com/andremaldonado
Re: socket write error
Hi, First, which version of Solr are you using? I also has 60 shards+ on Solr 4.2.1 and it doesn't seems to be a problem for me. - Make sure you use POST to send a query to Solr. - 'connection reset by peer' from client can indicate that there is something wrong with server e.g. server closes a connection etc. -- Patanachai On 05/02/2013 05:05 AM, Dmitry Kan wrote: After some searching around, I see this: http://search-lucene.com/m/ErEZUl7P5f2/%2522socket+write+error%2522subj=Long+list+of+shards+breaks+solrj+query Seems like this has happened in the past with large amount of shards. To make it clear: the distributed search works with 20 shards. On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi guys! We have solr router and shards. I see this in jetty log on the router: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset by peer: socket write error and then: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request followed by exception about Internal Server Error any ideas why this happens? We run 80+ shards distributed across several servers. Router runs on its own node. Is there anything in particular I should be looking into wrt ubuntu socket settings? Is this a known issue for solr's distributed search from the past? Thanks, Dmitry CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Random IllegalStateExceptions on Solr slave (3.6.1)
My first guess would be that your tomcat container timeouts need to be lengthened, but that's mostly a guess based on the socket timeout error message. Not sure where in Tomcat that needs to be configured though... Best Erick On Tue, Apr 30, 2013 at 12:37 PM, Arun Rangarajan arunrangara...@gmail.com wrote: We have a master-slave Solr set up and run live queries only against the slave. Full import (with optimize) happens on master every day at 2 a.m. Delta imports happen every 10 min for one entity and every hour for another entity. The following exceptions occur a few times every day in our app logs: ... org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) ... and this one: ... org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) ... These happen on different queries at different times, so they are not query-dependent. If I inspect the localhost.log file in tomcat on the solr slave server, the following exception occurs at the same time: Apr 06, 2013 7:16:33 AM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet default threw exception java.lang.IllegalStateException at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:722) ... The replication set-up is as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilessolrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://${master.ip}:${master.port}/solr/${ solr.core.name}/replication/str str name=pollInterval00:01:00/str /lst /requestHandler Aside from the full and delta imports, we also have external file fields which are loaded every 1 hour with reloadCache on both master and slave. What is causing these exceptions and how do I fix it? Thanks.
Re: EmbeddedSolrServer
because some of the underlying classes in SolrJ try to communicate with Zookeeper to intelligently route requests to leaders. It looks like you don't have your classpath pointed at the dist/solrj-lib, at least that would be my first guess... Best Erick On Wed, May 1, 2013 at 7:51 AM, Peri Subrahmanya peri.subrahma...@htcinc.com wrote: I m trying to use the EmbeddedSolrServer and here is my sample code: CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); Upon running I get the following exception - java.lang.NoClassDefFoundError: org/apache/solr/common/cloud/ZooKeeperException. I m not sure why its complaining about ZooKeeper. Any ideas please? Thank you, Peri Subrahmanya *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: solr master server
None that I know of. But if you hit issues I know where you can get help! :) Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 12:41 PM, Torsten Albrecht tors...@soahc.eu wrote: Hi, I want to set up a master / slave configuration for solr 3.6 Is there a best practice for the Raid config and the Linux partitions for the master server? Cheers, Torsten
Exception with Replication in solr 3.6
Hi, We have one master and 2 slaves with solr3.6. The below messages are logged in solr log . ERROR: Master at: http://server:port/solr/pe/replication is not available. Index fetch failed. Exception: Connection reset ERROR: Master at: http://server:port/solr/pe/replication is not available. Index fetch failed. Exception: Read timed out What does it mean? We are not getting these messages frequently. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-with-Replication-in-solr-3-6-tp4060514.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any plans to remove int32 limitation on the number of the documents in the index?
Val, Haven't seen this mentioned in a while... I'm curious...what sort of index, queries, hardware, and latency requirements do you have? Otis Solr ElasticSearch Support http://sematext.com/ On May 1, 2013 4:36 PM, Valery Giner valgi...@research.att.com wrote: Dear Solr Developers, I've been unable to find an answer to the question in the subject line of this e-mail, except of a vague one. We need to be able to index over 2bln+ documents. We were doing well without sharding until the number of docs hit the limit ( 2bln+). The performance was satisfactory for the queries, updates and indexing of new documents. That is, except for the need to go around the int32 limit, we don't really have a need for setting up distributed solr. I wonder whether some one on the solr team could tell us when/what version of solr we could expect the limit to be removed. I hope this question may be of interest to some one else :) -- Thanks, Val
Re: EmbeddedSolrServer
Actually, I found it very hard to figure out the exact Jar requirements for SolrJ. I ended up basically pointing at expanded webapp's lib directory, which is a total overkill. Would be nice to have some specific guidance on this issue. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 2, 2013 at 3:17 PM, Erick Erickson erickerick...@gmail.com wrote: because some of the underlying classes in SolrJ try to communicate with Zookeeper to intelligently route requests to leaders. It looks like you don't have your classpath pointed at the dist/solrj-lib, at least that would be my first guess... Best Erick On Wed, May 1, 2013 at 7:51 AM, Peri Subrahmanya peri.subrahma...@htcinc.com wrote: I m trying to use the EmbeddedSolrServer and here is my sample code: CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); Upon running I get the following exception - java.lang.NoClassDefFoundError: org/apache/solr/common/cloud/ZooKeeperException. I m not sure why its complaining about ZooKeeper. Any ideas please? Thank you, Peri Subrahmanya *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: What Happens to Consistency if I kill a Leader and Startup it again?
Hi, Can you actually make this happen? Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 8:12 AM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answer. This is what I try to say: time = t Node A (Leader): version is 100 Node B (Replica): version is 90 time = t+1 Node A (Killing): version is 100 and killed Node B (Replica): version is 90 time = t+2 Node A (Killed): version is 100 and killed Node B (Become Leader): version is 95 (we indexed something) time = t+3 Node A (Started as Replica): version is 100 and live Node B (Leader): version is 95 so I think that leader will behind replica. Is there anything different for such scenario? 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com The leader would not be behind replica because the old leader would not come back and take over the leader role. It would ne just a replica and it would replicate the index from whichever node is the leader. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote: I think about such situation: Let's assume that I am indexing at my SolrCloud. My leader has a version of higher than replica as well (I have one leader and one replica for each shard). If I kill leader, replica will be leader as well. When I startup old leader again it will be a replica for my shard. However I think that leader will have less document than replica and a less version than replica. Does it cause a problem because of leader is behing of replica?
Re: Exception with Replication in solr 3.6
Looks like a network issue, especially if this is not happening consistently. Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 3:42 PM, gpssolr2020 psgoms...@gmail.com wrote: Hi, We have one master and 2 slaves with solr3.6. The below messages are logged in solr log . ERROR: Master at: http://server:port/solr/pe/replication is not available. Index fetch failed. Exception: Connection reset ERROR: Master at: http://server:port/solr/pe/replication is not available. Index fetch failed. Exception: Read timed out What does it mean? We are not getting these messages frequently. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-with-Replication-in-solr-3-6-tp4060514.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DF is not updated when a document is marked for deletion note
Nah, not until and IF you see issues. Most users are not even aware of this. Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 8:05 AM, Furkan KAMACI furkankam...@gmail.com wrote: When I look at here: http://localhost:8983/solr/admin/luke I see that Note: Document Frequency (df) is not updated when a document is marked for deletion. df values include deleted documents. is it something like I should care?
Re: any plans to remove int32 limitation on the number of the documents in the index?
Otis, The documents themselves are relatively small, tens of fields, only a few of them could be up to a hundred bytes. Lunix Servers with relatively large RAM (256), Minutes on the searches are fine for our purposes, adding a few tens of millions of records in tens of minutes are also fine. We had to do some simple tricks for keeping indexing up to speed but nothing too fancy. Moving to the sharding adds a layer of complexity which we don't really need because of the above, ... and adding complexity may result in lower reliability :) Thanks, Val On 05/02/2013 03:41 PM, Otis Gospodnetic wrote: Val, Haven't seen this mentioned in a while... I'm curious...what sort of index, queries, hardware, and latency requirements do you have? Otis Solr ElasticSearch Support http://sematext.com/ On May 1, 2013 4:36 PM, Valery Giner valgi...@research.att.com wrote: Dear Solr Developers, I've been unable to find an answer to the question in the subject line of this e-mail, except of a vague one. We need to be able to index over 2bln+ documents. We were doing well without sharding until the number of docs hit the limit ( 2bln+). The performance was satisfactory for the queries, updates and indexing of new documents. That is, except for the need to go around the int32 limit, we don't really have a need for setting up distributed solr. I wonder whether some one on the solr team could tell us when/what version of solr we could expect the limit to be removed. I hope this question may be of interest to some one else :) -- Thanks, Val
Re: What Happens to Consistency if I kill a Leader and Startup it again?
Hi Otis; I see that at my admin page: Replication (Slave) Version GenSize Master: 1367307652512 82 778.04 MB Slave: 1367307658862 82 781.05 MB and I started to figure about it so that's why I asked this question. 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com Hi, Can you actually make this happen? Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 8:12 AM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answer. This is what I try to say: time = t Node A (Leader): version is 100 Node B (Replica): version is 90 time = t+1 Node A (Killing): version is 100 and killed Node B (Replica): version is 90 time = t+2 Node A (Killed): version is 100 and killed Node B (Become Leader): version is 95 (we indexed something) time = t+3 Node A (Started as Replica): version is 100 and live Node B (Leader): version is 95 so I think that leader will behind replica. Is there anything different for such scenario? 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com The leader would not be behind replica because the old leader would not come back and take over the leader role. It would ne just a replica and it would replicate the index from whichever node is the leader. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote: I think about such situation: Let's assume that I am indexing at my SolrCloud. My leader has a version of higher than replica as well (I have one leader and one replica for each shard). If I kill leader, replica will be leader as well. When I startup old leader again it will be a replica for my shard. However I think that leader will have less document than replica and a less version than replica. Does it cause a problem because of leader is behing of replica?
transientCacheSize doesn't seem to have any effect, except on startup
Hi, I've been very interested in the transient core feature of solr to manage a large number of cores. I'm especially interested in this use case, that the wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down now): loadOnStartup=false transient=true: This is really the use-case. There are a large number of cores in your system that are short-duration use. You want Solr to load them as necessary, but unload them when the cache gets full on an LRU basis. I'm creating 10 transient core via core admin like so $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=CREATEname=new_core2instanceDir=collection1/dataDir=new_core2transient=trueloadOnStartup=false and have transientCacheSize=2 in my solr.xml file, which I take means I should have at most 2 transient cores loaded at any time. The problem is that these cores are still loaded when when I ask solr to list cores: $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status; From the explanation in the wiki, it looks like solr would manage loading and unloading transient cores for me without having to worry about them, but this is not what's happening. The situation is different when I restart solr; it does the right thing by loading the maximum cores set by transientCacheSize. When I add more cores, the old behavior happens again, where all created transient cores are loaded in solr. I'm using the development branch lucene_solr_4_3 to run my example. I can open a jira if need be.
Re: Security for inter-solr-node requests
This feature is not yet part of Solr, but a feature under development in SOLR-4470. We encourage you to try it out and report back what worked best for you. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 2. mai 2013 kl. 13:58 skrev Furkan KAMACI furkankam...@gmail.com: Here is a part from wiki: 1) Just forward credentials from the super-request which caused the inter-solr-node sub-requests 2) Use internal credentials provided to the solr-node by the administrator at startup what do you use and is there any code example for it?
Pros and Cons of Using Deduplication of Solr at Huge Data Indexing
I use Solr 4.2.1 as SolrCloud. I crawl huge data with Nutch and index them with SolrCloud. I wonder about Solr's deduplication mechanism. What exactly it does and does it results with a slow indexing or is it beneficial for my situation?
RE: Pros and Cons of Using Deduplication of Solr at Huge Data Indexing
Distributed deduplication does not work right now: https://issues.apache.org/jira/browse/SOLR-3473 We've chosen not do use update processors for deduplication anymore and rely on several custom mapreduce jobs in Nutch and some custom collectors in Solr to do some on-demand online deduplication. If SOLR-3473 is fixed you can get very decent deduplication. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thu 02-May-2013 22:30 To: solr-user@lucene.apache.org Subject: Pros and Cons of Using Deduplication of Solr at Huge Data Indexing I use Solr 4.2.1 as SolrCloud. I crawl huge data with Nutch and index them with SolrCloud. I wonder about Solr's deduplication mechanism. What exactly it does and does it results with a slow indexing or is it beneficial for my situation?
Rearranging Search Results of a Search?
I know that I can use boosting at query for a field, for a searching term, at solrconfig.xml and query elevator so I can arrange the results of a search. However after I get top documents how can I change the order of a results? Does Lucene's postfilter stands for that?
Re: EmbeddedSolrServer
On 5/2/2013 1:43 PM, Alexandre Rafalovitch wrote: Actually, I found it very hard to figure out the exact Jar requirements for SolrJ. I ended up basically pointing at expanded webapp's lib directory, which is a total overkill. Would be nice to have some specific guidance on this issue. I have a SolrJ app that uses HttpSolrServer. Here is the list of jars in my lib directory relevant to SolrJ. There are other jars related to the other functionality in my app that I didn't list here. I take a very minimalistic approach for what I add to my lib directory. I work out the minimum jars required to get it to compile, then I try the program out and determine which additional jars it needs one by one. commons-io-2.4.jar httpclient-4.2.4.jar httpcore-4.2.4.jar httpmime-4.2.4.jar jcl-over-slf4j-1.7.5.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar solr-solrj-4.2.1.jar You might notice that my component versions are newer than what is included in dist/solrj-lib. I have tested all of the functionality of my application, and I do not require the other jars found in dist/solrj-lib, including zookeeper. When I add functionality in the future, if I run into a class not found exception, I will add the appropriate jar. If I were using CloudSolrServer, zookeeper would be required. With EmbeddedSolrServer, more Lucene and Solr jars are required, because that starts the Solr server itself within your application. Thanks, Shawn
Customizing Solr For a Certain Language
Hi folks; I want to use Solr to index any other language except for English. I will use Turkish documents to index with Solr. I will implement some algorithms that is more suitable to Turkish rather than English. Is there any wiki page that explains to steps for it? I mean what are the main parts of a customized Analyzer. i.e. a suitable stopwords.txt, a stemmer algorithm, customized tokenizer for that language, customized tokenizer filter etc. etc. Which steps should I follow?
Not able to see newly added copyField in the response (indexing is 80% complete)
Hello, I updated my schema to use a copyField and have triggered a reindex, 80% of the reindexing is complete. Although when I query the data, I don't see myNewCopyFieldName being returned with the documents. Is there something wrong with my schema or I need to wait for the indexing to complete to see the new copyField? This is my schema (retracted the actual names): fields field name=key type=string indexed=true stored=true/ field name=1 type=string indexed=true stored=true/ field name=2 type=string indexed=true stored=true/ field name=3 type=string indexed=false stored=true/ field name=4 type=string indexed=true stored=true/ field name=5 type=string indexed=true stored=true/ field name=6 type=custom_type indexed=true stored=true/ field name=7 type=text_general indexed=true stored=true/ field name=8 type=string indexed=true stored=true/ field name=9 type=text_general indexed=true stored=true/ field name=10 type=text_general indexed=true stored=true/ field name=11 type=string indexed=true stored=true/ field name=12 type=string indexed=true stored=true/ field name=13 type=string indexed=true stored=true/ field name=myNewCopyFieldName type=text_general indexed=true stored=true multiValued=true/ /fields defaultSearchField4/defaultSearchFielduniqueKeykey/uniqueKey copyField source=1 dest=myNewCopyFieldName/copyField source=2 dest=myNewCopyFieldName/copyField source=3 dest=myNewCopyFieldName/copyField source=4 dest=myNewCopyFieldName/copyField source=6 dest=myNewCopyFieldName/ Where: fieldType name=custom_type class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer/fieldType and fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer/fieldType -- Thanks, -Utkarsh
How to decide proper cache size at load testing?
I read that at wiki: Sometimes a smaller cache size will help avoid full garbage collections at the cost of more evictions. Load testing should be used to help determine proper cache sizes throughout the searching/indexing lifecycle. Could anybody give me an example scenario of how can I make a test, what should I do and find a proper cache size at load testing?
Re: Solr 4.2 rollback not working
We are using Solr 4.2.1, which claims to have fixed this issue. I have some logging indicating that the rollback is not broadcasted to other nodes in solr. So only one node in the cluster gets the rollback but not the others. Thanks, Dipti phone: 408.678.1595 | cell: 408.806.1970 | email: dipti.srivast...@apollogrp.edu Solutions Engineering and Integration: https://wiki.apollogrp.edu/display/NGP/Solutions+Engineering+and+Integratio n Support process: https://wiki.apollogrp.edu/display/NGP/Classroom+Services+Support P Please consider the environment before printing this email. On 5/2/13 12:09 AM, mark12345 marks1900-pos...@yahoo.com.au wrote: What version of Solr are you using? 4.2.0 or 4.2.1? The following might be of interest to you: * https://issues.apache.org/jira/browse/SOLR-4605 https://issues.apache.org/jira/browse/SOLR-4605 * https://issues.apache.org/jira/browse/SOLR-4733 https://issues.apache.org/jira/browse/SOLR-4733 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393 p4060401.html Sent from the Solr - User mailing list archive at Nabble.com. This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system. default.xml Description: default.xml
Re: Not able to see newly added copyField in the response (indexing is 80% complete)
On 5/2/2013 3:13 PM, Utkarsh Sengar wrote: Hello, I updated my schema to use a copyField and have triggered a reindex, 80% of the reindexing is complete. Although when I query the data, I don't see myNewCopyFieldName being returned with the documents. Is there something wrong with my schema or I need to wait for the indexing to complete to see the new copyField? After making sure that you restarted Solr (or reloaded the core) after changing your schema, there are two things to mention: 1) Using stored=true with a copyField doesn't make any sense, because you already have the individual values stored with the source fields. I haven't done any testing, but Solr might ignore stored=true on copyField fields. 2) If I'm wrong about how Solr behaves with stored=true on a copyField, then a soft commit (4.x and later) or a hard commit with openSearcher=true would be required to see changes from indexing. Have you committed your updates yet? Thanks, Shawn
Re: Customizing Solr For a Certain Language
Have you looked at the main example that comes with Solr? It contains a specific configuration for Turkish. Perhaps you could try that and narrow the question to more precise issues? I don't remember any Turkish-specific discussions, but perhaps something can be learned from searching for discussions on supporting Chinese and German languages. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 2, 2013 at 5:08 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi folks; I want to use Solr to index any other language except for English. I will use Turkish documents to index with Solr. I will implement some algorithms that is more suitable to Turkish rather than English. Is there any wiki page that explains to steps for it? I mean what are the main parts of a customized Analyzer. i.e. a suitable stopwords.txt, a stemmer algorithm, customized tokenizer for that language, customized tokenizer filter etc. etc. Which steps should I follow?
Re: What Happens to Consistency if I kill a Leader and Startup it again?
On 5/2/2013 2:19 PM, Furkan KAMACI wrote: I see that at my admin page: Replication (Slave) Version GenSize Master: 1367307652512 82 778.04 MB Slave: 1367307658862 82 781.05 MB and I started to figure about it so that's why I asked this question. As we've been trying to tell you, the sizes can (and will) be different between replicas on SolrCloud. Also, if you're not running a recent release candidate of 4.3, then the version numbers on the replication screen are misleading. See SOLR-4661 for more details. Your example of version numbers like 100, 90, and 95 wouldn't actually happen, because the version number is based on the current time in milliseconds since 1970-01-01 00:00:00 UTC. If you index after killing the leader, the new leader's version number will be higher than the offline replica. If you can find actual proof of a problem with index updates related to killing the leader, then we can take the bug report and work on fixing it. Here's how you would go about finding proof. It would be easiest to have one shard, but if you want to make sure it's OK with multiple shards, you would have to kill all the leaders. * Start with a functional collection with two replicas. * Index a document with a recognizable ID like A. * Make sure you can find document A. * Kill the leader replica, let's say it was replica1. * Make sure replica2 becomes leader. * Make sure you can find document A. * Index document B. * Start replica1, wait for it to turn green. * Make sure you can still find document B. * Kill the leader again, this time it's replica2. * Make sure you can still find document B. To my knowledge, nobody has reported a real problem with proof. I would imagine that more than one person has done testing like this to make sure that SolrCloud is reliable. Thanks, Shawn
Re: Not able to see newly added copyField in the response (indexing is 80% complete)
Thanks Shawn. Find my answers below. On Thu, May 2, 2013 at 2:34 PM, Shawn Heisey s...@elyograg.org wrote: On 5/2/2013 3:13 PM, Utkarsh Sengar wrote: Hello, I updated my schema to use a copyField and have triggered a reindex, 80% of the reindexing is complete. Although when I query the data, I don't see myNewCopyFieldName being returned with the documents. Is there something wrong with my schema or I need to wait for the indexing to complete to see the new copyField? After making sure that you restarted Solr (or reloaded the core) after changing your schema, there are two things to mention: Yes, I restarted solr and also did a reload. 1) Using stored=true with a copyField doesn't make any sense, because you already have the individual values stored with the source fields. I haven't done any testing, but Solr might ignore stored=true on copyField fields. Ah I see, didn't know about this. If its not stored then it makes sense. Need to verify this though. 2) If I'm wrong about how Solr behaves with stored=true on a copyField, then a soft commit (4.x and later) or a hard commit with openSearcher=true would be required to see changes from indexing. Have you committed your updates yet? I am using Solr 4.x and soft commit is enabled. So I assume commit happened. I see this in my solr admin: - lastModified:less than a minute ago - version:453962 - numDocs:26413743 - maxDoc: 28322675 - current: - indexing: yes So, lastModified = less than minute means the change was committed right? Thanks, Shawn -- Thanks, -Utkarsh
Re: string field does not yield exact match result using qf parameter
Hi, You can try to increase the pf boost for your string field, I don't think you'll have success in having it boosted with pf since it's a string? Check explain output with debugQuery=true and see whether you get a phrase boost. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 2. mai 2013 kl. 19:16 skrev kirpakaroji kirpakar...@yahoo.com: Hi Jan my question is when I tweak pf and qf parameter and the results change slightly and I do not think for exact match you need to implement the solution that you mentioned in your reply. you can always have string field and in your pf parameter you can boost that field to get the exact match results on top. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete from Solr Cloud 4.0 index..
On 5/2/2013 4:24 AM, Annette Newton wrote: Hi Shawn, Thanks so much for your response. We basically are very write intensive and write throughput is pretty essential to our product. Reads are sporadic and actually is functioning really well. We write on average (at the moment) 8-12 batches of 35 documents per minute. But we really will be looking to write more in the future, so need to work out scaling of solr and how to cope with more volume. Schema (I have changed the names) : http://pastebin.com/x1ry7ieW Config: http://pastebin.com/pqjTCa7L This is very clean. There's probably more you could remove/comment, but generally speaking I couldn't find any glaring issues. In particular, you have disabled autowarming, which is a major contributor to commit speed problems. The first thing I think I'd try is increasing zkClientTimeout to 30 or 60 seconds. You can use the startup commandline or solr.xml, I would probably use the latter. Here's a solr.xml fragment that uses a system property or a 15 second default: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:} hostContext=solr General thoughts, these changes might not help this particular issue: You've got autoCommit with openSearcher=true. This is a hard commit. If it were me, I would set that up with openSearcher=false and either do explicit soft commits from my application or set up autoSoftCommit with a shorter timeframe than autoCommit. This might simply be a scaling issue, where you'll need to spread the load wider than four shards. I know that there are financial considerations with that, and they might not be small, so let's leave that alone for now. The memory problems might be a symptom/cause of the scaling issue I just mentioned. You said you're using facets, which can be a real memory hog even with only a few of them. Have you tried facet.method=enum to see how it performs? You'd need to switch to it exclusively, never go with the default of fc. You could put that in the defaults or invariants section of your request handler(s). Another way to reduce memory usage for facets is to use disk-based docValues on version 4.2 or later for the facet fields, but this will increase your index size, and your index is already quite large. Depending on your index contents, the increase may be small or large. Something to just mention: It looks like your solrconfig.xml has hard-coded absolute paths for dataDir and updateLog. This is fine if you'll only ever have one core/collection on each server, but it'll be a disaster if you have multiples. I could be wrong about how these get interpreted in SolrCloud -- they might actually be relative despite starting with a slash. Thanks, Shawn
Re: Any estimation for solr 4.3?
On 5/2/2013 7:56 AM, Andy Lester wrote: On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com wrote: RC4 of 4.3 is available now. The final release of 4.3 is likely to be within days. How can I see the Changelog of what will be in it? Here's the latest CHANGES.txt file straight from the source repository on the 4.3 branch: https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_3/solr/CHANGES.txt?view=markup I did not see a 4_3_0 tag in svn. Perhaps that will be created as the last step before release. Thanks, Shawn
Re: Does Near Real Time get not supported at SolrCloud?
NRT works with SolrCloud. Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 5:34 AM, Furkan KAMACI furkankam...@gmail.com wrote: Does Near Real Time get not supported at SolrCloud? I mean when a soft commit occurs at a leader I think that it doesn't distribute it to replicas(because it is not at storage, does indexes at RAM distributes to replicas too?) and a search query comes what happens?
Re: How to decide proper cache size at load testing?
You simply need to monitor and adjust. Both during testing and in production because search patterns change over time. Hook up alerting to it to get notified of high evictions and low cache hit rate so you don't have to actively look at stats all day. Here is the graph of Query Cache metrics for http://search-lucene.com/ for example: https://apps.sematext.com/spm-reports/s.do?k=eDcirzHG7i Otis Solr ElasticSearch Support http://sematext.com/ On May 2, 2013 5:14 PM, Furkan KAMACI furkankam...@gmail.com wrote: I read that at wiki: Sometimes a smaller cache size will help avoid full garbage collections at the cost of more evictions. Load testing should be used to help determine proper cache sizes throughout the searching/indexing lifecycle. Could anybody give me an example scenario of how can I make a test, what should I do and find a proper cache size at load testing?
Re: Rearranging Search Results of a Search?
Hi, You should use search more often :) http://search-lucene.com/?q=scriptable+collectorsort=newestOnTopfc_project=Solrfc_type=issue Coincidentally, what you see there happens to be a good example of a Solr component that does something behind the scenes to deliver those search results even though my original query was bad. Knd of similar to what you are after. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI furkankam...@gmail.com wrote: I know that I can use boosting at query for a field, for a searching term, at solrconfig.xml and query elevator so I can arrange the results of a search. However after I get top documents how can I change the order of a results? Does Lucene's postfilter stands for that?
Re: SolrJ / Solr Two Phase Commit
By saying commits in Solr are global, do you mean per Solr deployment, per HttpSolrServer instance, per thread, or something else? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EmbeddedSolrServer
I actually have a maven project with a declared solrj dependency (4.2.1); Do I need anything extra to get rid of the Zookeeper exception? I didn't see jars specific to zookeeper in the list below that I would need. Any more ideas please? Thank you, Peri Subrahmanya On May 2, 2013, at 4:48 PM, Shawn Heisey s...@elyograg.org wrote: On 5/2/2013 1:43 PM, Alexandre Rafalovitch wrote: Actually, I found it very hard to figure out the exact Jar requirements for SolrJ. I ended up basically pointing at expanded webapp's lib directory, which is a total overkill. Would be nice to have some specific guidance on this issue. I have a SolrJ app that uses HttpSolrServer. Here is the list of jars in my lib directory relevant to SolrJ. There are other jars related to the other functionality in my app that I didn't list here. I take a very minimalistic approach for what I add to my lib directory. I work out the minimum jars required to get it to compile, then I try the program out and determine which additional jars it needs one by one. commons-io-2.4.jar httpclient-4.2.4.jar httpcore-4.2.4.jar httpmime-4.2.4.jar jcl-over-slf4j-1.7.5.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar solr-solrj-4.2.1.jar You might notice that my component versions are newer than what is included in dist/solrj-lib. I have tested all of the functionality of my application, and I do not require the other jars found in dist/solrj-lib, including zookeeper. When I add functionality in the future, if I run into a class not found exception, I will add the appropriate jar. If I were using CloudSolrServer, zookeeper would be required. With EmbeddedSolrServer, more Lucene and Solr jars are required, because that starts the Solr server itself within your application. Thanks, Shawn *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: SolrJ / Solr Two Phase Commit
Peer core or collection, depending on whether we're talking about Cloud or not. Basically, commits in Solr are about controlling visibility more than anything, although now with Cloud, they have resource consumption and lifecycle ramifications as well. On May 2, 2013 10:01 PM, mark12345 marks1900-pos...@yahoo.com.au wrote: By saying commits in Solr are global, do you mean per Solr deployment, per HttpSolrServer instance, per thread, or something else? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ / Solr Two Phase Commit
Question: Just to clarify. Are you saying that if I have multiple threads using multiple instances of HttpSolrServer each making calls to add SolrInputDocuments (For example, httpSolrServer.add(SolrInputDocument doc). ), and one server calls httpSolrServer.commit(), all documents added are now commited? If that is the case it does help me understand the rollback api description in a new light. http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29 Performs a rollback of all non-committed documents pending. Note that this is not a true rollback as in databases. Content you have previously added may have been committed due to autoCommit, buffer full, other client performing a commit etc. Michael Della Bitta-2 wrote Per core or collection, depending on whether we're talking about Cloud or not. Basically, commits in Solr are about controlling visibility more than anything, although now with Cloud, they have resource consumption and lifecycle ramifications as well. On May 2, 2013 10:01 PM, mark12345 wrote: By saying commits in Solr are global, do you mean per Solr deployment, per HttpSolrServer instance, per thread, or something else? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060589.html Sent from the Solr - User mailing list archive at Nabble.com.
The HttpSolrServer add(CollectionSolrInputDocument docs) method is not atomic.
One thing I noticed is that while the HttpSolrServer add(SolrInputDocument doc) method is atomic (Either a bean is added or an exception is thrown), the HttpSolrServer add(CollectionSolrInputDocument docs) method is not atomic. Question: Is there a way to commit multiple documents/beans in a transaction/together in a way that it succeeds completely or fails completely? Quick outline of what I did to highlight a call to HttpSolrServer add(CollectionSolrInputDocument docs) method is not atomic. 1. Create 5 documents, comprising of 4 valid documents (Documents 1,2,4,5) and 1 document with an issue, document 3. 2. Call to HttpSolrServer add(CollectionSolrInputDocument docs) which threw a SolrException. 3. Call to HttpSolrServer commit(). 4. Discovered that 2 out of 5 (documents 1 and 2) documents where still commited. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2 rollback not working
Sorry I don't know enough of the system to help you directly. I was only going to suggest upgrading to 4.2.1 if you where using 4.2.0. It might be worthwhile to create a JIRA issue for what you are experiencing. https://issues.apache.org/jira/browse/SOLR Dipti Srivastava wrote We are using Solr 4.2.1, which claims to have fixed this issue. I have some logging indicating that the rollback is not broadcasted to other nodes in solr. So only one node in the cluster gets the rollback but not the others. Thanks, Dipti On 5/2/13 12:09 AM, mark12345 wrote: What version of Solr are you using? 4.2.0 or 4.2.1? The following might be of interest to you: * https://issues.apache.org/jira/browse/SOLR-4605 lt;https://issues.apache.org/jira/browse/SOLR-4605gt; * https://issues.apache.org/jira/browse/SOLR-4733 lt;https://issues.apache.org/jira/browse/SOLR-4733gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4060401.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4060592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EmbeddedSolrServer
On 5/2/2013 8:07 PM, Peri Subrahmanya wrote: I actually have a maven project with a declared solrj dependency (4.2.1); Do I need anything extra to get rid of the Zookeeper exception? I didn't see jars specific to zookeeper in the list below that I would need. Any more ideas please? SolrJ has a dependency on zookeeper. Not all uses of solrj will require zookeeper, but maven cannot know how you intend to use it, so I don't think anything can be done about automatically pulling it in. Thanks, Shawn
Re: socket write error
Hi, thanks. Solr 3.4. There is POST request everywhere, between client and router, router and shards. Do you do faceting across all shards? How many documents approx you have? On 2 May 2013 22:02, Patanachai Tangchaisin patanachai.tangchai...@wizecommerce.com wrote: Hi, First, which version of Solr are you using? I also has 60 shards+ on Solr 4.2.1 and it doesn't seems to be a problem for me. - Make sure you use POST to send a query to Solr. - 'connection reset by peer' from client can indicate that there is something wrong with server e.g. server closes a connection etc. -- Patanachai On 05/02/2013 05:05 AM, Dmitry Kan wrote: After some searching around, I see this: http://search-lucene.com/m/**ErEZUl7P5f2/%2522socket+write+** error%2522subj=Long+list+of+**shards+breaks+solrj+queryhttp://search-lucene.com/m/ErEZUl7P5f2/%2522socket+write+error%2522subj=Long+list+of+shards+breaks+solrj+query Seems like this has happened in the past with large amount of shards. To make it clear: the distributed search works with 20 shards. On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi guys! We have solr router and shards. I see this in jetty log on the router: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.** HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset by peer: socket write error and then: May 02, 2013 1:30:22 PM org.apache.commons.httpclient.** HttpMethodDirector executeWithRetry INFO: Retrying request followed by exception about Internal Server Error any ideas why this happens? We run 80+ shards distributed across several servers. Router runs on its own node. Is there anything in particular I should be looking into wrt ubuntu socket settings? Is this a known issue for solr's distributed search from the past? Thanks, Dmitry CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: AutoSuggest+Grouping in one request
Hi, Hm, I *think* you can't do it in one go with Solr's Suggester, but I'm not expert there. I can only point you to something like our AutoComplete - http://sematext.com/products/autocomplete/index.html - which, as you can see on that screenshot, has the grouping you seem to be after. Maybe somebody else can point out if Solr Suggester can do the same? Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Apr 26, 2013 at 9:58 AM, Rounak Jain rouna...@gmail.com wrote: Hi everyone, Search dropdowns on popular sites like Amazon (example imagehttp://i.imgur.com/aQyM8WD.jpg) use autosuggested words along with grouping (Field Collapsing in Solr). While I can replicate the same functionality in Solr using two requests (first to obtain suggestions, second for the actual query using the most probable suggestion), I want to know if this can be done in one request itself. I understand that there are various ways to obtain suggestions (term component, facets, Solr's inbuilt Suggesterhttp://wiki.apache.org/solr/Suggester), and I'm open to using any one of them, if it means I'll be able to get everything (groups + suggestions) in one request. Looking forward to some advice with regard to this. Thanks, Rounak
Re: SolrJ / Solr Two Phase Commit
Yes, that is correct. --wunder On May 2, 2013, at 7:46 PM, mark12345 wrote: Question: Just to clarify. Are you saying that if I have multiple threads using multiple instances of HttpSolrServer each making calls to add SolrInputDocuments (For example, httpSolrServer.add(SolrInputDocument doc). ), and one server calls httpSolrServer.commit(), all documents added are now commited? If that is the case it does help me understand the rollback api description in a new light. http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29 Performs a rollback of all non-committed documents pending. Note that this is not a true rollback as in databases. Content you have previously added may have been committed due to autoCommit, buffer full, other client performing a commit etc. Michael Della Bitta-2 wrote Per core or collection, depending on whether we're talking about Cloud or not. Basically, commits in Solr are about controlling visibility more than anything, although now with Cloud, they have resource consumption and lifecycle ramifications as well. On May 2, 2013 10:01 PM, mark12345 wrote: By saying commits in Solr are global, do you mean per Solr deployment, per HttpSolrServer instance, per thread, or something else? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060589.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
Re: More than one sort criteria
Hallo, Peter, try sorting them only using one sort parameter, separating the fields by comma. sort=zip+asc,street+asc This was it, thank you. Ciao Peter Schütt