Re: Updating documents and commit/rollback

2018-03-05 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/2/18 7:46 PM, Shawn Heisey wrote:
> On 3/2/2018 10:39 AM, Christopher Schultz wrote:
>> The problem is that I'm updating the index after my SQL UPDATE(s)
>> have run, but before my SQL COMMIT occurs. I have had a problem
>> where the SQL fails and rolls-back, but the solrClient is not
>> rolled-back.
>> 
>> I'm a little wary of rolling-back Solr because, as I understand
>> it, the client itself doesn't carry any transactional
>> information. That is, it should be a shared-resource (within the
>> web application) and indeed, other clients could be connecting
>> from other places (like other app servers running the same
>> application). Performing either commit() or rollback() on the
>> Solr client will commit/rollback *all* writes since the last
>> commit, right?
> 
> Correct.  Relational databases typically keep track of transactions
> on one connection separately from transactions on another
> connection, and can roll one of them back without affecting the
> others.
> 
> Solr doesn't have this capability.  The reason that it doesn't have
> this capability is that Lucene doesn't have it, and the majority of
> Solr functionality is provided by Lucene.
> 
> If updates are happening concurrently from multiple sources, then 
> there's no way to have any kind of meaningful rollback.
> 
> I see two solutions:
> 
> 1) Funnel all updates through a single thread/process, which will
> not move on from one update to another until the final decision is
> made about that update.  Then rolling back becomes possible,
> because there is only one source for updates.  The disadvantage
> here is that this thread/process becomes a bottleneck, and
> performance may suffer greatly.  Also, it can be a single point of
> failure.  If the rate of updates is low, then the bottleneck may
> not be a problem.
> 
> 2) Have your updating software revert the changes "manually" in 
> situations where the SQL change is rolled back ... by either
> deleting the record or sending another update to change values back
> to what they were before.

Yeah, technique #2 was the only thing I could come up with that made
any sense. Serializing updates is probably more trouble than it's worth.

In an environment where I'd probably expect to have maybe 50 - 100
"writes" daily to a Solr core, how do you recommend commits be done?
The documents are quite small (user metadata like username, first/last
and email). Can I add/commit simultaneously? There seems to be no
reason to perform separate add/commit steps in this scenario.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddMUdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjQHBAAiZaJLBQM6t6OLYea
LsGtqCtDTCmUuJGpBq7q8/+26OkgCTK0KDOGWlqpMeMvCe8uLlN0qDTGHEm0nLCk
Ils9Yv+UOP8iiYMvodUxv5d5Y75Yt5aQ0yZ8X7vp1KOCXTZhXIjmAdtw8KaC3z4y
zYJcI3DAEYurkmJcFVwZNQ7LRck2RWRNNsRfWaZ0yGAd2AUvvCp2zV3e0i5cs7hA
xICklU+5+5Nsy90pyDalnpgwrbc0uE6ZFGSkAocSDBdvNNONbNAq+sUYsov8af0+
6qhQWOqZOT2M+Ue51Nlqy+PtECzWOsqXcpFNyM/2Rsz1cnKCzAUbDs2Hi7m5R1UX
tST10VBvFTJ4GukGVPxHysVxwTHVg1HYCEngfHKS7HqiVtwkqWMzm315toWoDRfQ
J8EMeFZ/cQx716D+DPAKudGBWZ3akyODsb9h1KB4i85pGT4rijKhY7bxddhFDnHi
gbCdnpU9/pv8G/Y2SUhW4SgEUd3X6YZZD/4cZ4ocrf8KaXBFrLe8iz1aoFYI5ldh
i3TAi28dFHqxrofBTo4f42AXm9SYsycCQ2kBj7Yegyt5Sljfr3yoOckoJnNR05mX
2qjBIJJjJT0CvnV18azerdhpkZtcVbdVYC4WZHEjf6doC3SqqLHL6Pfu5Ha4APZ8
hc0tRk3wV+Cn/XVVx691QN0X1Nw=
=0s2n
-END PGP SIGNATURE-


Re: Updating documents and commit/rollback

2018-03-02 Thread Shawn Heisey
On 3/2/2018 10:39 AM, Christopher Schultz wrote:
> The problem is that I'm updating the index after my SQL UPDATE(s) have
> run, but before my SQL COMMIT occurs. I have had a problem where the SQL
> fails and rolls-back, but the solrClient is not rolled-back.
>
> I'm a little wary of rolling-back Solr because, as I understand it, the
> client itself doesn't carry any transactional information. That is, it
> should be a shared-resource (within the web application) and indeed,
> other clients could be connecting from other places (like other app
> servers running the same application). Performing either commit() or
> rollback() on the Solr client will commit/rollback *all* writes since
> the last commit, right?

Correct.  Relational databases typically keep track of transactions on
one connection separately from transactions on another connection, and
can roll one of them back without affecting the others.

Solr doesn't have this capability.  The reason that it doesn't have this
capability is that Lucene doesn't have it, and the majority of Solr
functionality is provided by Lucene.

If updates are happening concurrently from multiple sources, then
there's no way to have any kind of meaningful rollback.

I see two solutions:

1) Funnel all updates through a single thread/process, which will not
move on from one update to another until the final decision is made
about that update.  Then rolling back becomes possible, because there is
only one source for updates.  The disadvantage here is that this
thread/process becomes a bottleneck, and performance may suffer
greatly.  Also, it can be a single point of failure.  If the rate of
updates is low, then the bottleneck may not be a problem.

2) Have your updating software revert the changes "manually" in
situations where the SQL change is rolled back ... by either deleting
the record or sending another update to change values back to what they
were before.

Thanks,
Shawn



Updating documents and commit/rollback

2018-03-02 Thread Christopher Schultz
Hey, folks. I've been a long-time Lucene user (running a hilariously-old
1.9.1 version forever), but I'm only just now getting into using Solr.

My particular use-case is storing information about web-application
users so they can be found more quickly than our current RDBMS-based
search (SELECT ... FROM user WHERE username LIKE '%foo%' OR
email_address LIKE '%foo%' OR last_name LIKE '%foo%'...).

I've set up my Solr (very basic... just untar, bin/solr start), created
a core/collection (I'm running single-server for now, no cloudy
zookeeper stuff ATM), customized my schema (using the Schema API, since
hand-editing is discouraged) and loaded my data. I can search just fine
through the Solr dashboard.

I've also user solr-solrj to perform searches from within my
application, replacing the previous JDBC-based search with the
Solr-based one. All is well.

Now I'm trying to figure out the best way to update users in the index
when their information (e.g. first/last names) change. I have used
solr-solrj quite simply like this:

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", user.getId());
doc.addField("username", user.getUsername());
doc.addField("first_name", user.getFirstName());
doc.addField("last_name", user.getLastName());
...
solrClient.add("users", doc);
solrClient.commit();

I'm having a problem, though, and I'd like to know what the "right"
solution is.

The problem is that I'm updating the index after my SQL UPDATE(s) have
run, but before my SQL COMMIT occurs. I have had a problem where the SQL
fails and rolls-back, but the solrClient is not rolled-back.

I'm a little wary of rolling-back Solr because, as I understand it, the
client itself doesn't carry any transactional information. That is, it
should be a shared-resource (within the web application) and indeed,
other clients could be connecting from other places (like other app
servers running the same application). Performing either commit() or
rollback() on the Solr client will commit/rollback *all* writes since
the last commit, right?

That means that there is no meaningful way that I can say to Solr "oops,
I actually need you to NOT add that document I just told you about".
Instead, I have to either commit the document I don't want (and, I
dunno, delete it later or whatever) or risk rolling-back other writes
that other clients have performed.

Do I have that right?

So... what's the best way to do this kind of thing? Can I ask Solr to
add-and-commit at the same time? If so, how? Is there a meaningful
"rollback this one addition" that I can perform? If so, how?

Thanks for a great product,
-chris



signature.asc
Description: OpenPGP digital signature