Re: Updating documents and commit/rollback
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 3/2/18 7:46 PM, Shawn Heisey wrote: > On 3/2/2018 10:39 AM, Christopher Schultz wrote: >> The problem is that I'm updating the index after my SQL UPDATE(s) >> have run, but before my SQL COMMIT occurs. I have had a problem >> where the SQL fails and rolls-back, but the solrClient is not >> rolled-back. >> >> I'm a little wary of rolling-back Solr because, as I understand >> it, the client itself doesn't carry any transactional >> information. That is, it should be a shared-resource (within the >> web application) and indeed, other clients could be connecting >> from other places (like other app servers running the same >> application). Performing either commit() or rollback() on the >> Solr client will commit/rollback *all* writes since the last >> commit, right? > > Correct. Relational databases typically keep track of transactions > on one connection separately from transactions on another > connection, and can roll one of them back without affecting the > others. > > Solr doesn't have this capability. The reason that it doesn't have > this capability is that Lucene doesn't have it, and the majority of > Solr functionality is provided by Lucene. > > If updates are happening concurrently from multiple sources, then > there's no way to have any kind of meaningful rollback. > > I see two solutions: > > 1) Funnel all updates through a single thread/process, which will > not move on from one update to another until the final decision is > made about that update. Then rolling back becomes possible, > because there is only one source for updates. The disadvantage > here is that this thread/process becomes a bottleneck, and > performance may suffer greatly. Also, it can be a single point of > failure. If the rate of updates is low, then the bottleneck may > not be a problem. > > 2) Have your updating software revert the changes "manually" in > situations where the SQL change is rolled back ... by either > deleting the record or sending another update to change values back > to what they were before. Yeah, technique #2 was the only thing I could come up with that made any sense. Serializing updates is probably more trouble than it's worth. In an environment where I'd probably expect to have maybe 50 - 100 "writes" daily to a Solr core, how do you recommend commits be done? The documents are quite small (user metadata like username, first/last and email). Can I add/commit simultaneously? There seems to be no reason to perform separate add/commit steps in this scenario. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddMUdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjQHBAAiZaJLBQM6t6OLYea LsGtqCtDTCmUuJGpBq7q8/+26OkgCTK0KDOGWlqpMeMvCe8uLlN0qDTGHEm0nLCk Ils9Yv+UOP8iiYMvodUxv5d5Y75Yt5aQ0yZ8X7vp1KOCXTZhXIjmAdtw8KaC3z4y zYJcI3DAEYurkmJcFVwZNQ7LRck2RWRNNsRfWaZ0yGAd2AUvvCp2zV3e0i5cs7hA xICklU+5+5Nsy90pyDalnpgwrbc0uE6ZFGSkAocSDBdvNNONbNAq+sUYsov8af0+ 6qhQWOqZOT2M+Ue51Nlqy+PtECzWOsqXcpFNyM/2Rsz1cnKCzAUbDs2Hi7m5R1UX tST10VBvFTJ4GukGVPxHysVxwTHVg1HYCEngfHKS7HqiVtwkqWMzm315toWoDRfQ J8EMeFZ/cQx716D+DPAKudGBWZ3akyODsb9h1KB4i85pGT4rijKhY7bxddhFDnHi gbCdnpU9/pv8G/Y2SUhW4SgEUd3X6YZZD/4cZ4ocrf8KaXBFrLe8iz1aoFYI5ldh i3TAi28dFHqxrofBTo4f42AXm9SYsycCQ2kBj7Yegyt5Sljfr3yoOckoJnNR05mX 2qjBIJJjJT0CvnV18azerdhpkZtcVbdVYC4WZHEjf6doC3SqqLHL6Pfu5Ha4APZ8 hc0tRk3wV+Cn/XVVx691QN0X1Nw= =0s2n -END PGP SIGNATURE-
Re: Updating documents and commit/rollback
On 3/2/2018 10:39 AM, Christopher Schultz wrote: > The problem is that I'm updating the index after my SQL UPDATE(s) have > run, but before my SQL COMMIT occurs. I have had a problem where the SQL > fails and rolls-back, but the solrClient is not rolled-back. > > I'm a little wary of rolling-back Solr because, as I understand it, the > client itself doesn't carry any transactional information. That is, it > should be a shared-resource (within the web application) and indeed, > other clients could be connecting from other places (like other app > servers running the same application). Performing either commit() or > rollback() on the Solr client will commit/rollback *all* writes since > the last commit, right? Correct. Relational databases typically keep track of transactions on one connection separately from transactions on another connection, and can roll one of them back without affecting the others. Solr doesn't have this capability. The reason that it doesn't have this capability is that Lucene doesn't have it, and the majority of Solr functionality is provided by Lucene. If updates are happening concurrently from multiple sources, then there's no way to have any kind of meaningful rollback. I see two solutions: 1) Funnel all updates through a single thread/process, which will not move on from one update to another until the final decision is made about that update. Then rolling back becomes possible, because there is only one source for updates. The disadvantage here is that this thread/process becomes a bottleneck, and performance may suffer greatly. Also, it can be a single point of failure. If the rate of updates is low, then the bottleneck may not be a problem. 2) Have your updating software revert the changes "manually" in situations where the SQL change is rolled back ... by either deleting the record or sending another update to change values back to what they were before. Thanks, Shawn
Updating documents and commit/rollback
Hey, folks. I've been a long-time Lucene user (running a hilariously-old 1.9.1 version forever), but I'm only just now getting into using Solr. My particular use-case is storing information about web-application users so they can be found more quickly than our current RDBMS-based search (SELECT ... FROM user WHERE username LIKE '%foo%' OR email_address LIKE '%foo%' OR last_name LIKE '%foo%'...). I've set up my Solr (very basic... just untar, bin/solr start), created a core/collection (I'm running single-server for now, no cloudy zookeeper stuff ATM), customized my schema (using the Schema API, since hand-editing is discouraged) and loaded my data. I can search just fine through the Solr dashboard. I've also user solr-solrj to perform searches from within my application, replacing the previous JDBC-based search with the Solr-based one. All is well. Now I'm trying to figure out the best way to update users in the index when their information (e.g. first/last names) change. I have used solr-solrj quite simply like this: SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", user.getId()); doc.addField("username", user.getUsername()); doc.addField("first_name", user.getFirstName()); doc.addField("last_name", user.getLastName()); ... solrClient.add("users", doc); solrClient.commit(); I'm having a problem, though, and I'd like to know what the "right" solution is. The problem is that I'm updating the index after my SQL UPDATE(s) have run, but before my SQL COMMIT occurs. I have had a problem where the SQL fails and rolls-back, but the solrClient is not rolled-back. I'm a little wary of rolling-back Solr because, as I understand it, the client itself doesn't carry any transactional information. That is, it should be a shared-resource (within the web application) and indeed, other clients could be connecting from other places (like other app servers running the same application). Performing either commit() or rollback() on the Solr client will commit/rollback *all* writes since the last commit, right? That means that there is no meaningful way that I can say to Solr "oops, I actually need you to NOT add that document I just told you about". Instead, I have to either commit the document I don't want (and, I dunno, delete it later or whatever) or risk rolling-back other writes that other clients have performed. Do I have that right? So... what's the best way to do this kind of thing? Can I ask Solr to add-and-commit at the same time? If so, how? Is there a meaningful "rollback this one addition" that I can perform? If so, how? Thanks for a great product, -chris signature.asc Description: OpenPGP digital signature