Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings
Chris Hostetter-3 wrote > If you are seeing an increase in "Overlapping onDeckSearchers" when using > DocExpirationUpdateProcessorFactory, it's becuase you actaully have docs > expiring quite frequently relative to the autoDeletePeriodSeconds and > the amount of time needed to warm each of the new searchers. > > if ou don't want the searchers to be re-opened so frequently, just > increase the autoDeletePeriodSeconds. But if I increase the period, then it'll have even more docs that have expired, and shouldn't that make the amount of time needed to warm the new searcher even longer? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-DocExpirationUpdateProcessorFactory-causes-Overlapping-onDeckSearchers-warnings-tp4309155p4309175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings
Okay, I created a JIRA ticket (https://issues.apache.org/jira/servicedesk/agent/SOLR/issue/SOLR-9841) and will work on a patch. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-DocExpirationUpdateProcessorFactory-causes-Overlapping-onDeckSearchers-warnings-tp4309155p4309173.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings
I'm using Solr Cloud 6.1.0, and my client application is using SolrJ 6.1.0. Using this Solr config, I get none of the dreaded "PERFORMANCE WARNING: Overlapping onDeckSearchers=2" log messages: https://dl.dropboxusercontent.com/u/49733981/solrconfig-no_warnings.xml However, I start getting them frequently after I add an expiration update processor to the update request processor chain, as seen in this config (at the bottom): https://dl.dropboxusercontent.com/u/49733981/solrconfig-warnings.xml Do I have something configured wrong in the way I've tried to add the function of expiring documents? My client application sets the "expire_at" field with the date to remove the document being added, so I don't need anything on the Solr Cloud side to calculate the expiration date using a TTL. I've confirmed that the documents are getting removed as expected after the TTL duration. Is it possible that the expiration processor is triggering additional commits? Seems like the warning is usually the result of commits happening too frequently. If the commit spacing is fine without the expiration processor, but not okay when I add it, it seems like maybe each update is now triggering a (soft?) commit. Although, that'd actually be crazy and I'm sure I'd see a lot more errors if that were the case... is it triggering a commit every 30 seconds, because that's what I have the autoDeletePeriodSeconds set to? Maybe if I try to offset that a bit from the 10 second auto soft commit I'm using? Seems like it'd be better (if that is the case) if the processor simple didn't have to do a commit when it expires documents, and instead let the auto commit settings handle that. Do I still need the line: when I have the element? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-DocExpirationUpdateProcessorFactory-causes-Overlapping-onDeckSearchers-warnings-tp4309155.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: "on deck" searcher vs warming searcher
Hmmm, conflicting answers. Given the infamous "PERFORMANCE WARNING: Overlapping onDeckSearchers" log message, it seems like the "they're the same" answer is probably correct, because shouldn't there only be one active searcher at a time? Although it makes me curious, if there's a warning about having multiple (overlapping) warming searchers, why is there a setting (maxWarmingSearchers) that even lets you have more than one, or at least, why ever set it to anything other than 1? -- View this message in context: http://lucene.472066.n3.nabble.com/on-deck-searcher-vs-warming-searcher-tp4309021p4309080.html Sent from the Solr - User mailing list archive at Nabble.com.
"on deck" searcher vs warming searcher
Is there a difference between an "on deck" searcher and a warming searcher? >From what I've read, they sound like the same thing. -- View this message in context: http://lucene.472066.n3.nabble.com/on-deck-searcher-vs-warming-searcher-tp4309021.html Sent from the Solr - User mailing list archive at Nabble.com.
Does _version_ field in schema need to be indexed and/or stored?
I know that in the sample config sets, the _version_ field is indexed and not stored, like so: Is there any reason it needs to be indexed? I'm able to create collections and use them with it not indexed, but I wonder if it negatively impacts performance. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-version-field-in-schema-need-to-be-indexed-and-or-stored-tp4303036.html Sent from the Solr - User mailing list archive at Nabble.com.
TTL: expungeDeletes=false when removing expired documents
I've got a DocExpirationUpdateProcessorFactory configured to periodically remove expired documents from the Solr index, which is working in that the documents no longer show up in queries once they've reached expiration date. But the index size isn't reduced when they expire, and I'm wondering if it's because of the expungeDeletes setting in this log line I'm seeing: o.a.s.u.p.DocExpirationUpdateProcessorFactory Begining periodic deletion of expired docs o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} How can I set expungeDeletes=true, and what are the drawbacks to doing so? -- View this message in context: http://lucene.472066.n3.nabble.com/TTL-expungeDeletes-false-when-removing-expired-documents-tp4302596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: For TTL, does expirationFieldName need to be indexed?
Thanks for the reply. Follow up: Do I need to have the field stored? While I don't need to ever look at the field's original contents, I'm guessing that the DocExpirationUpdateProcessorFactory does, so that would mean I need to have stored=true as well, correct? -- View this message in context: http://lucene.472066.n3.nabble.com/For-TTL-does-expirationFieldName-need-to-be-indexed-tp4301522p4302386.html Sent from the Solr - User mailing list archive at Nabble.com.
For TTL, does expirationFieldName need to be indexed?
In my solrconfig.xml, I have: 30 expire_at and in my schema, I have: If I change it to indexed="false", will it still work? If so, is there any benefit to having the field indexed if I'm not using it in any way except to allow the expiration processor to remove expired documents? -- View this message in context: http://lucene.472066.n3.nabble.com/For-TTL-does-expirationFieldName-need-to-be-indexed-tp4301522.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: leader/replica update error + client "expected mime type/stale state" error
Okay, I'll try to figure out how to making using to the latest feasible... my situation is that I need to be able to talk to both DSE and Solr from the same application, and 5.4.1 is the latest version that works with DSE. But this issue appears to be stemming from the Solr side. The client logs its messages after the messages are logged in the Solr server logs. Just pretending that I'm using SolrJ 6.1.0, would there be any explanation for my question? -- View this message in context: http://lucene.472066.n3.nabble.com/leader-replica-update-error-client-expected-mime-type-stale-state-error-tp4297053p4297205.html Sent from the Solr - User mailing list archive at Nabble.com.
leader/replica update error + client "expected mime type/stale state" error
I'm running Solr Cloud 6.1.0, with a Java client using SolrJ 5.4.1. Every once in awhile, during a query, I get a pair of messages logged in the client from CloudSolrClient -- an error about a request failing, then a warning saying that it's retrying after a stale state error. For this test, the collection (test_collection) has one shard, with RF=2. There are two machines, 10.112.7.2 (replica) and 10.112.7.4 (leader). The client is on 10.112.7.4. Note that the system time on 10.112.7.4 is about 1 minute, 5-6 seconds ahead of the other machine. --- Leader (10.112.7.4) Solr log: --- 19:27:16.583 ERROR [c:test_collection s:shard1 r:core_node2 x:test_collection_shard1_replica2] o.a.s.u.StreamingSolrClients error org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19:27:16.587 WARN [c:test_collection s:shard1 r:core_node2 x:test_collection_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor Error sending update to http://10.112.7.2:8983/solr org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at
Create collection PeerSync "no frame of reference" warnings
Whenver I create a collection that's replicated, I get these warnings in the follower Solr log: WARN [c:test_collection s:shard1 r:core_node1 x:test_collection_shard1_replica1] o.a.s.u.PeerSync no frame of reference to tell if we've missed updates Are these harmless? -- View this message in context: http://lucene.472066.n3.nabble.com/Create-collection-PeerSync-no-frame-of-reference-warnings-tp4297032.html Sent from the Solr - User mailing list archive at Nabble.com.
IOException errors in SolrJ client
I'm getting periodic errors when adding documents from a Java client app. It's always a sequence of an error message logged in CloudSolrClient, then an exception thrown from org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143): Error message: [ERROR][impl.CloudSolrClient][pool-6-thread-27][CloudSolrClient.java@904] - Request to collection test_collection failed due to (0) org.apache.http.NoHttpResponseException: 10.112.7.3:8983 failed to respond, retry? 0 Exception with stack: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://10.112.7.3:8983/solr/test_collection at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:372) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:325) at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1100) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:871) at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:807) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) ... Caused by: org.apache.http.NoHttpResponseException: 10.112.7.3:8983 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480) ... 22 more I'm guessing it's due to a timeout, but maybe not my client's timeout setting, but instead a timeout between two Solr Cloud servers, perhaps when the document is being sent from one to the other. This app is running on machine 10.112.7.4, and test_collection has a single shard, replicated on both 10.112.7.4 and 10.112.7.3, with .3 being the leader. Is this saying that .4 got the add request from my app, and tried to tell .3 to add the doc, but .3 didn't respond? If so, is it a timeout, and if so, can I increase the timeout value? -- View this message in context: http://lucene.472066.n3.nabble.com/IOException-errors-in-SolrJ-client-tp4297015.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud: Higher search latency with two nodes vs one node
Thanks for the reply. The overhead you describe is what I suspected, I was just suprised that if DSE is able to keep that overhead small enough that the overall result is faster with the extra hardware, Solr doesn't also benefit. I did try with RF=2 and shards=1, and yep, it's way fast. Really nice performance. I'm hoping I can figure out a configuration that will allow me to get this type of performance boost over DSE with a much higher load. Follow up question: I'm getting periodic errors when adding documents in the Java client app: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://:8983/solr/ I believe it's always preceded by this log message: Request to collection failed due to (0) org.apache.http.NoHttpResponseException: :8983 failed to respond, retry? 0 I'm guessing it's due to a timeout, but maybe not my client's timeout setting, but instead a timeout between the two Solr Cloud servers, perhaps when a query is being sent from one to the other. Probably unrelated to those errors because these are at different times, but in the Solr logs, I'm getting occasional warning+error messages like this: always server 2: WARN [c: s:shard1 r:core_node2 x:_shard1_replica1] o.a.s.c.RecoveryStrategy Stopping recovery for core=[_shard1_replica1] coreNodeName=[core_node2] followed by server 1: ERROR [c: s:shard1 r:core_node1 x:_shard1_replica2] o.a.s.u.StreamingSolrClients error org.apache.http.NoHttpResponseException: :8983 failed to respond WARN [c: s:shard1 r:core_node1 x:_shard1_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Leader is publishing core=_shard1_replica1 coreNodeName =core_node2 state=down on behalf of un-reachable replica http://:8983/solr/_shard1_replica1/ Any idea what the cause is for these and how to avoid them? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894p4296124.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud: Higher search latency with two nodes vs one node
1 0 false ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536} ${solr.autoSoftCommit.maxTime:3} 1024 true 20 30 false 2 edismax explicit 3 750 text The Solr queries have the following params: rows=30=search=true===:()= There are no explicit commits happening. Any help is greatly appreciated, and let me know if there's any additional info that would be helpful. -Brent -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894.html Sent from the Solr - User mailing list archive at Nabble.com.
Detecting down node with SolrJ
Is there a way to tell whether or not a node at a specific address is up using a SolrJ API? -- View this message in context: http://lucene.472066.n3.nabble.com/Detecting-down-node-with-SolrJ-tp4295402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High load, frequent updates, low latency requirement use case
Emir Arnautovic wrote > There should be no problems with ingestion on 24 machines. Assuming 1 > replication, that is roughly 40 doc/sec/server. Make sure you bulk docs > when ingesting. What is bulking docs, and how do I do it? I'm guessing this is some sort of batch loading of documents? Thanks for the reply. -Brent -- View this message in context: http://lucene.472066.n3.nabble.com/High-load-frequent-updates-low-latency-requirement-use-case-tp4293383p4295225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mailing list subscriptions
ZOMG thanks! This makes the mailing list almost tolerable, in an only-15-years-behind kind of way instead of a 25-years-behind kind of way. -- View this message in context: http://lucene.472066.n3.nabble.com/Mailing-list-subscriptions-tp4294026p4294098.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding request parameters when using SolrCloudClient.add()
Using SolrJ, I'm trying to figure out how to include request parameters when adding a document with SolrCloudClient.add(). Here is what I was doing when using HttpSolrClient instead of SolrCloudClient: HttpSolrClient client = new HttpSolrClient.Builder(" http://hostname.com:8983/solr/corename;) .withHttpClient(HttpClientBuilder.create().build()) .build(); client.getInvariantParams().set("param1_name", "param1_value"); client.getInvariantParams().set("param2_name", "param2_value"); SolrInputDocument sDoc = new SolrInputDocument(); // Populate sDoc fields. UpdateResponse response = client.add(sDoc); SolrCloudClient doesn't appear to have any methods for adding parameters, so I'm not sure how I can do it. When doing reads, I add parameters to the SolrQuery object, but when adding a SolrInputDocument object, I don't have that option. The only thing I can think of is that maybe adding them as fields in the SolrInputDocument object would do the trick, but would that actually work the same?
Mailing list subscriptions
Is there a way to subscribe to just responses to a question I ask on the mailing list, without getting emails for all activity on the mailing list? You'd think it'd be designed in a way that when someone submits a question, they automatically get emails for any responses.
SolrJ Solr Cloud timeout
Is there any way to set a timeout with a CloudSolrClient?
High load, frequent updates, low latency requirement use case
I'm trying to set up a Solr Cloud cluster to support a system with the following characteristics: It will be writing documents at a rate of approximately 500 docs/second, and running search queries at about the same rate. The documents are fairly small, with about 10 fields, most of which range in size from a simple int to a string that holds a UUID. There's a date field, and then three text fields that typically hold in the range of 350 to 500 chars. Documents should be available for searching within 30 seconds of being added. We need an average search latency of 50 ms or faster. We've been using DataStax Enterprise with decent results, but trying to determine if we can get more out of the latest version of Solr Cloud, as we originally chose DSE ~4 years ago *I believe* because its Cassandra-backed Solr provided redundancy/high availability features that weren't currently available with straight Solr (not even sure if Solr Cloud was available then). We have 24 fairly beefy servers (96 CPU cores, 256 GB RAM, SSDs) for the task, and I'm trying to figure out the best way to distribute the documents into collections, cores, and shards. If I can categorize a document into one of 8 "types", should I create 8 collections? Is that going to provide better performance than putting them all into one collection and then using a filter query with the type field when doing a search? What are the options/things to consider when deciding on the number of shards for each collection? As far as I know, I don't choose the number of Solr cores, that is just determined base on the replication factor (and shard count?). Some of the settings I'm using in my solrconfig that seem important: ${solr.lock.type:native} ${solr.autoCommit.maxTime:3} false ${solr.autoSoftCommit.maxTime:1000} true 8 I've got the updateLog/transaction log enabled, as I think I read it's required for Solr Cloud. Are there any settings I should look at that affect performance significantly, especially outside of the solrconfig.xml for each collection (like jetty configs, logging properties, etc)? How much impact do the directives in the solrconfig have on performance? Do they only take effect if I have something configured that requires them, and therefore if I'm missing one that I need, I'd get an error if it's not defined? Any help will be greatly appreciated. Thanks! -Brent
Re: SOLRJ replace document
So I found out the issue here... It was related to what you guys said regarding the Map object in my document. The problem is that I had data being serialized from DB - .NET - JSON and some of the fields in .NET was == System.DBNull.Value instead of null. This caused the JSON serializer to write out an object (ie. Map) so when these fields got deserialized into the SolrInputDocument it had the Map objects as you indicated. Thanks for the help! Much appreciated! On Sat, Oct 19, 2013 at 12:58 AM, Jack Krupansky j...@basetechnology.comwrote: By all means please do file a support request with DataStax, either as an official support ticket or as a question on StackOverflow. But, I do think the previous answer of avoiding the use of a Map object in your document is likely to be the solution. -- Jack Krupansky -Original Message- From: Brent Ryan Sent: Friday, October 18, 2013 10:21 PM To: solr-user@lucene.apache.org Subject: Re: SOLRJ replace document So I think the issue might be related to the tech stack we're using which is SOLR within DataStax enterprise which doesn't support atomic updates. But I think it must have some sort of bug around this because it doesn't appear to work correctly for this use case when using solrj ... Anyways, I've contacted support so lets see what they say. On Fri, Oct 18, 2013 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 10/18/2013 3:36 PM, Brent Ryan wrote: My schema is pretty simple and has a string field called solr_id as my unique key. Once I get back to my computer I'll send some more details. If you are trying to use a Map object as the value of a field, that is probably why it is interpreting your add request as an atomic update. If this is the case, and you're doing it because you have a multivalued field, you can use a List object rather than a Map. If this doesn't sound like what's going on, can you share your code, or a simplification of the SolrJ parts of it? Thanks, Shawn
SOLRJ replace document
How do I replace a document in solr using solrj library? I keep getting this error back: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Atomic document updates are not supported unless updateLog/ is configured I don't want to do partial updates, I just want to replace it... Thanks, Brent
Re: SOLRJ replace document
I wish that was the case but calling addDoc() is what's triggering that exception. On Friday, October 18, 2013, Jack Krupansky wrote: To replace a Solr document, simply add it again using the same technique used to insert the original document. The set option for atomic update is only used when you wish to selectively update only some of the fields for a document, and that does require that the update log be enabled using updateLog. -- Jack Krupansky -Original Message- From: Brent Ryan Sent: Friday, October 18, 2013 4:59 PM To: solr-user@lucene.apache.org Subject: SOLRJ replace document How do I replace a document in solr using solrj library? I keep getting this error back: org.apache.solr.client.solrj.**impl.HttpSolrServer$**RemoteSolrException: Atomic document updates are not supported unless updateLog/ is configured I don't want to do partial updates, I just want to replace it... Thanks, Brent
Re: SOLRJ replace document
My schema is pretty simple and has a string field called solr_id as my unique key. Once I get back to my computer I'll send some more details. Brent On Friday, October 18, 2013, Shawn Heisey wrote: On 10/18/2013 2:59 PM, Brent Ryan wrote: How do I replace a document in solr using solrj library? I keep getting this error back: org.apache.solr.client.solrj.**impl.HttpSolrServer$**RemoteSolrException: Atomic document updates are not supported unless updateLog/ is configured I don't want to do partial updates, I just want to replace it... Replacing a document is done by simply adding the document, in the same way as if you were adding a new one. If you have properly configured Solr, the old one will be deleted before the new one is inserted. Properly configuring Solr means that you have a uniqueKey field in yourschema, and that it is a simple type like string, int, long, etc, and is not multivalued. A TextField type that is tokenized cannot be used as the uniqueKey field. Thanks, Shawn
Re: SOLRJ replace document
So I think the issue might be related to the tech stack we're using which is SOLR within DataStax enterprise which doesn't support atomic updates. But I think it must have some sort of bug around this because it doesn't appear to work correctly for this use case when using solrj ... Anyways, I've contacted support so lets see what they say. On Fri, Oct 18, 2013 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 10/18/2013 3:36 PM, Brent Ryan wrote: My schema is pretty simple and has a string field called solr_id as my unique key. Once I get back to my computer I'll send some more details. If you are trying to use a Map object as the value of a field, that is probably why it is interpreting your add request as an atomic update. If this is the case, and you're doing it because you have a multivalued field, you can use a List object rather than a Map. If this doesn't sound like what's going on, can you share your code, or a simplification of the SolrJ parts of it? Thanks, Shawn
Re: SOLR grouped query sorting on numFound
ya, that's the problem... you can't sort by numFound and it's not feasible to do the sort on the client because the grouped result set is too large. Brent On Wed, Sep 25, 2013 at 6:09 AM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, just specifying sort= is _almost_ what you want, except it sorts by the value of fields in the doc not numFound. this shouldn't be hard to do on the client though, but you'd have to return all the groups... FWIW, Erick On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan brent.r...@gmail.com wrote: We ran into 1 snag during development with SOLR and I thought I'd run it by anyone to see if they had any slick ways to solve this issue. Basically, we're performing a SOLR query with grouping and want to be able to sort by the number of documents found within each group. Our query response from SOLR looks something like this: { responseHeader:{ status:0, QTime:17, params:{ indent:true, q:*:*, group.limit:0, group.field:rfp_stub, group:true, wt:json, rows:1000}}, grouped:{ rfp_stub:{ matches:18470, groups:[{ groupValue:java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e, doclist:{*numFound*:3,start:0,docs:[] }}, { groupValue:java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce, doclist:{*numFound*:5,start:0,docs:[] }}, { groupValue:java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131, doclist:{*numFound*:6,start:0,docs:[] }}, … The *numFound* shows the number of documents within that group. Is there anyway to perform a sort on *numFound* in SOLR ? I don't believe this is supported, but wondered if anyone their has come across this and if there was any suggested workarounds given that the dataset is really too large to hold in memory on our app servers?
SOLR grouped query sorting on numFound
We ran into 1 snag during development with SOLR and I thought I'd run it by anyone to see if they had any slick ways to solve this issue. Basically, we're performing a SOLR query with grouping and want to be able to sort by the number of documents found within each group. Our query response from SOLR looks something like this: { responseHeader:{ status:0, QTime:17, params:{ indent:true, q:*:*, group.limit:0, group.field:rfp_stub, group:true, wt:json, rows:1000}}, grouped:{ rfp_stub:{ matches:18470, groups:[{ groupValue:java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e, doclist:{*numFound*:3,start:0,docs:[] }}, { groupValue:java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce, doclist:{*numFound*:5,start:0,docs:[] }}, { groupValue:java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131, doclist:{*numFound*:6,start:0,docs:[] }}, … The *numFound* shows the number of documents within that group. Is there anyway to perform a sort on *numFound* in SOLR ? I don't believe this is supported, but wondered if anyone their has come across this and if there was any suggested workarounds given that the dataset is really too large to hold in memory on our app servers?
apache bench solr keep alive not working?
Does anyone know why solr is not respecting keep-alive requests when using apache bench? ab -v 4 -H Connection: Keep-Alive -H Keep-Alive: 3000 -k -c 10 -n 100 http://host1:8983/solr/test.solr/select?q=*%3A*wt=xmlindent=true; Response contains this and if you look at debug output you see http header of Connection: Close Keep-Alive requests:0 Any ideas? I'm seeing the same behavior when using http client. Thanks, Brent
Re: apache bench solr keep alive not working?
thanks guys. I saw this other post with curl and verified it working. I've also used apache bench for a bunch of stuff and keep-alive works fine with things like Java netty.io servers ... strange that tomcat isn't respecting the http protocol or headers There must be a bug in this version of tomcat being used. On Tue, Sep 10, 2013 at 6:23 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Does anyone know why solr is not respecting keep-alive requests when using : apache bench? I've seen this before from people trying to test with ab, but never fully understood it. There is some combination of using ab (which uses HTTP/1.0 and the non-RFC compliant HTTP/1.0 version of optional Keep-Alive) and jetty (which also attempts to support the non-RFC compliant HTTP/1.0 version of optional Keep-Alive) and chunked responses that come from jetty when you request dynamic resources (like solr query URLs) as opposed to static resources with a known file size. I suspect that jetty's best attempt at doing the hooky HTTP/1.0 Keep-Alive thing doesn't work with chunked responses -- or maybe ab doesn't actually keep the connection alive unless it gets a content-length. either way, see this previous thread for how you can demonstrate that keep-alive works properly using curl... https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201209.mbox/%3c5048e856.9080...@ea.com%3E -Hoss
CRLF Invalid Exception ?
Has anyone ever hit this when adding documents to SOLR? What does it mean? ERROR [http-8983-6] 2013-09-06 10:09:32,700 SolrException.java (line 108) org.apache.solr.common.SolrException: Invalid CRLF at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:175) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:663) at com.datastax.bdp.cassandra.index.solr.CassandraDispatchFilter.execute(CassandraDispatchFilter.java:176) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at com.datastax.bdp.cassandra.index.solr.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:139) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:194) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.datastax.bdp.cassandra.index.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:95) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.datastax.bdp.cassandra.index.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:722) Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:387) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) ... 30 more Caused by: java.io.IOException: Invalid CRLF at org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:352) at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:151) at org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:710) at org.apache.coyote.Request.doRead(Request.java:428) at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:304) at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:403) at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:327) at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:162) at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1046) at com.ctc.wstx.sr.StreamScanner.parseLocalName2(StreamScanner.java:1796) at com.ctc.wstx.sr.StreamScanner.parseLocalName(StreamScanner.java:1756) at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2914) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) ... 33 more
Re: CRLF Invalid Exception ?
Thanks. I realized there's an error in the ChunkedInputFilter... I'm not sure if this means there's a bug in the client library I'm using (solrj 4.3) or is a bug in the server SOLR 4.3? Or is there something in my data that's causing the issue? On Fri, Sep 6, 2013 at 1:02 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Has anyone ever hit this when adding documents to SOLR? What does it mean? Always check for the root cause... : Caused by: java.io.IOException: Invalid CRLF : : at : org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:352) ...so while Solr is trying to read XML off the InputStream from the client, an error is encountered by the ChunkedInputFilter. I suspect the client library you are using for the HTTP connection is claiming it's using chunking but isn't, or is doing something wrong with the chunking, or there is a bug in the ChunkedInputFilter. -Hoss
Re: CRLF Invalid Exception ?
For what it's worth... I just updated to solrj 4.4 (even though my server is solr 4.3) and it seems to have fixed the issue. Thanks for the help! On Fri, Sep 6, 2013 at 1:41 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I'm not sure if this means there's a bug in the client library I'm using : (solrj 4.3) or is a bug in the server SOLR 4.3? Or is there something in : my data that's causing the issue? It's unlikly that an error in the data you pass to SolrJ methods would be causing this problem -- i'm pretty sure it's not even a problem with the raw xml data being streamed, it appears to be a problem with how that data is getting shunked across the wire. My best guess is that the most likely causes are either... * a bug in the HttpClient versio you are using on the client side * a bug in the ChunkedInputFilter you are using on the server side * a misconfiguration on the HttpClient object you are using with SolrJ (ie: claiming it's sending chunked when it's not?) -Hoss
JSON update request handler commitWithin
I'm prototyping a search product for us and I was trying to use the commitWithin parameter for posting updated JSON documents like so: curl -v 'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' --data-binary @rfp.json -H 'Content-type:application/json' However, the commit never seems to happen as you can see below there are still 2 docsPending (even 1 hour later). Is there a trick to getting this to work with submitting to the json update request handler? [cid:483C4A1C-D20D-4AAB-822E-DFCA03026572]
Re: JSON update request handler commitWithin
Ya, looks like this is a bug in Datastax Enterprise 3.1.2. I'm using their enterprise cluster search product which is built on SOLR 4. :( On 9/5/13 11:24 AM, Jack Krupansky j...@basetechnology.com wrote: I just tried commitWithin with the standard Solr example in Solr 4.4 and it works fine. Can you reproduce your problem using the standard Solr example in Solr 4.4? -- Jack Krupansky From: Ryan, Brent Sent: Thursday, September 05, 2013 10:39 AM To: solr-user@lucene.apache.org Subject: JSON update request handler commitWithin I'm prototyping a search product for us and I was trying to use the commitWithin parameter for posting updated JSON documents like so: curl -v 'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' --data-binary @rfp.json -H 'Content-type:application/json' However, the commit never seems to happen as you can see below there are still 2 docsPending (even 1 hour later). Is there a trick to getting this to work with submitting to the json update request handler?
Large import making solr unresponsive
This is an issue we've only been running into lately so I'm not sure what to make of it. We have 2 cores on a solr machine right now, one of them is about 10k documents, the other is about 1.5mil. None of the documents are very large, only about 30 short attributes. We also have about 10 requests/sec hitting the smaller core and less on the larger one. Whenever we try to do a full import on the smaller one everything is fine, the response times stay the same during the whole 30 seconds it takes to run the indexer. The cpu also stays fairly low. When we run a full import on the larger one the response times on all cores tank from about 10ms to over 8 seconds. We have a 4 core machine (VM) and I've noticed 1 core stays pegged the entire time which is understandable since the DIH as I understand it is single threaded. Also, from what I can tell there is no disk, network, or memory pressure (8gb) either and the other procs do virtually nothing. Also the responses from solr still come back with a 10ms qtime. My best guess at this point is tomcat is having issues when the single proc gets pegged but I'm at a loss on how to further diagnose this to a tomcat issue or something weird that solr is doing. Has anyone run into this before or have ideas about what might be happening?
RE: problem adding new fields in DIH
Thanks for the explanation and bug report Robert! -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 09, 2012 3:18 PM To: solr-user@lucene.apache.org Subject: Re: problem adding new fields in DIH Thanks again for reporting this Brent. I opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3610 On Mon, Jul 9, 2012 at 3:36 PM, Brent Mills bmi...@uship.com wrote: We're having an issue when we add or change a field in the db-data-config.xml and schema.xml files in solr. Basically whenever I add something new to index I add it to the database, then the data config, then add the field to the schema to index, reload the core, and do a full import. This has worked fine until we upgraded to an iteration of 4.0 (we are currently on 4.0 alpha). Now sometimes when we go through this process solr throws errors about the field not being found. The only way to fix this is to restart tomcat and everything immediately starts working fine again. The interesting thing is that this is only a problem if the database is returning a value for that field and only in the documents that have a value. The field shows up in the schema browser in solr, it just has no data in it. If I completely remove it from the database but leave it in the schema and dataconfig files there is no issue. Also of note, this is happening on 2 different machines. Here's the trace SEVERE: Exception while solr commit. java.lang.IllegalArgumentException: no such field test at org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49) at org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380) -- lucidimagination.com
problem adding new fields in DIH
We're having an issue when we add or change a field in the db-data-config.xml and schema.xml files in solr. Basically whenever I add something new to index I add it to the database, then the data config, then add the field to the schema to index, reload the core, and do a full import. This has worked fine until we upgraded to an iteration of 4.0 (we are currently on 4.0 alpha). Now sometimes when we go through this process solr throws errors about the field not being found. The only way to fix this is to restart tomcat and everything immediately starts working fine again. The interesting thing is that this is only a problem if the database is returning a value for that field and only in the documents that have a value. The field shows up in the schema browser in solr, it just has no data in it. If I completely remove it from the database but leave it in the schema and dataconfig files there is no issue. Also of note, this is happening on 2 different machines. Here's the trace SEVERE: Exception while solr commit. java.lang.IllegalArgumentException: no such field test at org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49) at org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380)
RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?
Hi James, I just pulled down the newest nightly build of 4.0 and it solves an issue I had been having with solr ignoring the caching of the child entities. It was basically opening a new connection for each iteration even though everything was specified correctly. This was present in my previous build of 4.0 so it looks like you fixed it with one of those patches. Thanks for all your work on the DIH, the caching improvements are a big help with some of the things we will be rolling out in production soon. -Brent -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Monday, May 07, 2012 1:47 PM To: solr-user@lucene.apache.org Cc: Brent Mills; dye.kel...@gmail.com; keithn...@dswinc.com Subject: RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6? Dear Kellen, Brent Keith, There now are fixes available for 2 cache-related bugs that unfortunately made their way into the 3.6.0 release. These were addressed on these 2 JIRA issues, which have been committed to the 3.6 branch (as of today): - https://issues.apache.org/jira/browse/SOLR-3430 - https://issues.apache.org/jira/browse/SOLR-3360 These problem were also affecting Trunk/4.x, with both fixes being committed to Trunk under SOLR-3430. Should Solr 3.6.1 be released, these fixes will become generally available at that time. They also will be part of the 4.0 release, which the Development Community hopes will be later this year. In the mean time, I am hoping each of you can test these fixes with your installation. The best way to do this is to get a fresh SVN checkout of the 3.6.1 branch (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch to the solr directory, then run ant dist. I believe you need Ant 1.8 to build. If you are unable to build yourself, I put an *unofficial* shapshot of the DIH jar here: http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar Please let me know if this solves your problems with DIH Caching, giving you the functionality you had with 3.5 and prior. Your feedback is greatly appreciatd. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: not interesting [mailto:dye.kel...@gmail.com] Sent: Monday, May 07, 2012 9:43 AM To: solr-user@lucene.apache.org Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6? I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same data-import.xml for both versions. The import functioned properly with 3.4. I'm using a nested entity to fetch authors associated with each document, and I'm using CachedSqlEntityProcessor to avoid hitting the DB an unreasonable number of times. However, when indexing, Solr indexes very slowly and appears to be fetching all authors in the DB for each document. The index should be ~500 megs; I aborted the indexing when it reached ~6gigs. If I comment out the nested author entity below, Solr will index normally. Am I missing something obvious or is this a bug? document name=documents entity name=document dataSource=production transformer=HTMLStripTransformer,TemplateTransformer,RegexTransformer query=select id, ..., from document field column=id name=id/ field column=uid name=uid template=DOC${document.id}/ !-- more fields .. -- entity name=author dataSource=production query=select cast(da.document_id as text) as document_id, a.id, a.name, a.signature from document_author da left outer join author a on a.id = da.author_id cacheKey=document_id cacheLookup=document.id processor=CachedSqlEntityProcessor field name=author_id column=id / field name=author column=name / field name=author_signature column=signature / /entity /entity /document Also posted at SO if you prefer to answer there: http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6 Kellen
Upgrading to 3.6 broke cachedsqlentityprocessor
I've read some things in jira on the new functionality that was put into caching in the DIH but I wouldn't think it should break the old behavior. It doesn't look as though any errors are being thrown, it's just ignoring the caching part and opening a ton of connections. Also I cannot find any documentation on the new functionality that was added so I'm not sure what syntax is valid and what's not. Here is my entity that worked in 3.1 but no longer works in 3.6: entity name=Emails query=SELECT * FROM Account.SolrUserSearchEmails WHERE '${dataimporter.request.clean}' != 'false' OR DateUpdated = dateadd(ss, -30, '${dataimporter.last_index_time}') processor=CachedSqlEntityProcessor where=UserID=Users.UserID
RE: Sharing dih dictionaries
You're totally correct. There's actually a link on the DIH page now which wasn't there when I had read it a long time ago. I'm really looking forward to 4.0, it's got a ton of great new features. Thanks for the links!! -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Monday, December 05, 2011 10:45 PM To: solr-user@lucene.apache.org Subject: Re: Sharing dih dictionaries It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613. I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary. Regards On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills bmi...@uship.com wrote: I'm not really sure how to title this but here's what I'm trying to do. I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity. I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join. Ex: entity name=parts query=select name, code1, code2, code3 from parts field column=name name=name / entity name=shareddictionary1 query=select code, description from partcodes where=code=parts.code1 field column=description name=code1desc //entity entity name=shareddictionary2 query=select code, description from partcodes where=code=parts.code2 field column=description name=code1desc //entity entity name=shareddictionary3 query=select code, description from partcodes where=code=parts.code3 field column=description name=code1desc //entity /entity Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns. It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query. Any ideas? -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Sharing dih dictionaries
I'm not really sure how to title this but here's what I'm trying to do. I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity. I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join. Ex: entity name=parts query=select name, code1, code2, code3 from parts field column=name name=name / entity name=shareddictionary1 query=select code, description from partcodes where=code=parts.code1 field column=description name=code1desc //entity entity name=shareddictionary2 query=select code, description from partcodes where=code=parts.code2 field column=description name=code1desc //entity entity name=shareddictionary3 query=select code, description from partcodes where=code=parts.code3 field column=description name=code1desc //entity /entity Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns. It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query. Any ideas?
multiple local indexes
In our application, we need to be able to search across multiple local indexes. We need this not so much for performance reasons, but because of the particular needs of our project. But the indexes, while sharing the same schema can be vary different in terms of size and distribution of documents. By that I mean that some indexes may have a lot more documents about some topic while others will have more documents about other topics. We want to be able add documents to the individual indexes as well. I can provide more detail about our project is necessary. Thus, the Distributed Search feature with shards in different cores seems to be an obvious solution except for the limitation of distributed idf. First, I want to make sure my understanding about the distributed idf limitation are correct: If your documents are spread across your shards evenly, then the distribution of terms across the individual shards can be assumed to be even enough not to matter. If, as in our case, the shards are not very uniform, then this limitation is magnified. Even though simplistic, do I have the basic idea? We have hacked together something that allows us to read from multiple indexes, but it isn't really a long-term solution. It's just sort of shoe-horned in there. Here are some notes from the programmer who worked on this: Two custom files: EgranaryIndexReaderFactory.java and EgranaryIndexReader.java EgranaryIndexReader.java No real work is done here. This class extends lucene.index.MultiReader and overrides the directory() and getVersion() methods inherited from IndexReader. These methods don't make sense for a MultiReader as they only return a single value. However, Solr expects Readers to have these methods. directory() was overridden to return a call to directory() on the first reader in the subreader list. The same was done for getVersion(). This hack makes any use of these methods by Solr somewhat pointless. EgranaryIndexReaderFactory.java Overrides the newReader(Directory indexDir, boolean readOnly) method The expected behavior of this method is to construct a Reader from the index at indexDir. However, this method ignores indexDir and reads a list of indexDirs from the solrconfig.xml file. These indices are used to create a list of lucene.index.IndexReader classes. This list is then used to create the EgranaryIndexReader. So the second questions is: Does anybody have other ideas about how we might solve this problem? Is distributed search still our best bet? Thanks for your thoughts! Brent
Re: multiple local indexes
Thanks for your comments, Jonathon. Here is some information that gives a brief overview of the eGranary Platform in order to quickly outline the need for a solution for bringing multiple indexes into one searchable collection. http://www.widernet.org/egranary/info/multipleIndexes Thanks, Brent On 9/28/2010 5:40 PM, Jonathan Rochkind wrote: Honestly, I think just putting everything in the same index is your best bet. Are you sure your particular needs of your project can't be served by one combined index? You can certainly still query on just a portion of the index when needed using fq -- you can even create a request handler (or multiple request handlers) with invariant or appends to force that all queries through that request handler have a fixed fq. From: Brent Palmer [br...@widernet.org] Sent: Tuesday, September 28, 2010 6:04 PM To: solr-user@lucene.apache.org Subject: multiple local indexes In our application, we need to be able to search across multiple local indexes. We need this not so much for performance reasons, but because of the particular needs of our project. But the indexes, while sharing the same schema can be vary different in terms of size and distribution of documents. By that I mean that some indexes may have a lot more documents about some topic while others will have more documents about other topics. We want to be able add documents to the individual indexes as well. I can provide more detail about our project is necessary. Thus, the Distributed Search feature with shards in different cores seems to be an obvious solution except for the limitation of distributed idf. First, I want to make sure my understanding about the distributed idf limitation are correct: If your documents are spread across your shards evenly, then the distribution of terms across the individual shards can be assumed to be even enough not to matter. If, as in our case, the shards are not very uniform, then this limitation is magnified. Even though simplistic, do I have the basic idea? We have hacked together something that allows us to read from multiple indexes, but it isn't really a long-term solution. It's just sort of shoe-horned in there. Here are some notes from the programmer who worked on this: Two custom files: EgranaryIndexReaderFactory.java and EgranaryIndexReader.java EgranaryIndexReader.java No real work is done here. This class extends lucene.index.MultiReader and overrides the directory() and getVersion() methods inherited from IndexReader. These methods don't make sense for a MultiReader as they only return a single value. However, Solr expects Readers to have these methods. directory() was overridden to return a call to directory() on the first reader in the subreader list. The same was done for getVersion(). This hack makes any use of these methods by Solr somewhat pointless. EgranaryIndexReaderFactory.java Overrides the newReader(Directory indexDir, boolean readOnly) method The expected behavior of this method is to construct a Reader from the index at indexDir. However, this method ignores indexDir and reads a list of indexDirs from the solrconfig.xml file. These indices are used to create a list of lucene.index.IndexReader classes. This list is then used to create the EgranaryIndexReader. So the second questions is: Does anybody have other ideas about how we might solve this problem? Is distributed search still our best bet? Thanks for your thoughts! Brent
Re: using multisearcher
Thanks Hoss. I haven't had time to try it yet, but that is exactly the kind of help I was looking for. Brent Chris Hostetter wrote: : As for the second part, I was thinking of trying to replace the standard : SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very : familiar with the workings of Solr, especially with respect to the caching : that goes on. I thought that maybe people who are more familiar with it might : have some tips on how to go about it. Or perhaps there are reasons that make : this a bad idea. If your indexes are all local, then using a MultiReader would be simpler trying to shoehorn MultiSearcher type logic into SolrIndexSearcher. https://issues.apache.org/jira/browse/SOLR-243 -Hoss -- Brent Palmer Widernet.org University of Iowa 319-335-2200
Re: using multisearcher
Hi Hoss, The indexes are local. Also they are very different sizes and I think that the merged results could be really wonky when using DistributedSearch because of it (although I might be totally wrong about that). As for the second part, I was thinking of trying to replace the standard SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very familiar with the workings of Solr, especially with respect to the caching that goes on. I thought that maybe people who are more familiar with it might have some tips on how to go about it. Or perhaps there are reasons that make this a bad idea. Brent Chris Hostetter wrote: If you've been using a MultiSearcher to query multiple *remote* searchers, then Distributed searching in solr should be a appropriate. if you're use to useing MultiSearcher as a way of aggregating from multiple *local* indexes distributed searching is probably going to seem slow compared to what you're use to (because of the overhead) : pretty sure this isn't what we want. Also does anyone have any comments about : trying to replace the SolrIndexSearcher with a SolrMultiSearcher; reasons why ...i'm not really sure what you mean by that. -Hoss
using multisearcher
Hi everybody, I'm interested in using Solr to search multiple indexes at once. We currently use our own search application which uses lucene's multisearcher. Has anyone attempted to or successfully replaced SolrIndexSearcher with some kind of multisearcher? I have looked at the DistributedSearch on the wiki and I'm pretty sure this isn't what we want. Also does anyone have any comments about trying to replace the SolrIndexSearcher with a SolrMultiSearcher; reasons why we shouldn't do this, pitfalls; suggestions about how to go about it etc. Also, it should be noted that we would only be adding documents to one of the indexes. I can give more info about the context of this application if necessary. Thank you for any suggestions! -- Brent Palmer Widernet.org University of Iowa 319-335-2200
Re: including time as a factor in relevance?
[2006-07-15 14:06] WHIRLYCOTT said: | I need to have my search result relevance influenced by time. Older | things in my index are less relevant than newer things. I don't want | to do a strict sort by date. Is this supported somehow by using a | dismax request handler? Or if you have other hints, I'm eager to | know what they are. I've recently used solr's FunctionQuery to do just this with great success. Take a look at FunctionQuery and ReciprocalFloatFunction. If I have some time later today, I'll put together a simple example. cheers! Brent