After spending some time on this I am planning a different approach. I am just 
going to have the other system notify my system of what keys it changed. This 
way I can update the index. When I laid out the complexity involved in the 
constraint approach, they were willing to change their system behavior to 
assist mine.

Doing it through the constraint is just too much of a performance hit. This is 
because I need to convert the Mutations back to POJOs and probably to my 
systems JSON format before I can index them in ElasticSearch. This turns into 
an big O(M*N) algorithm where M is the number of mutations and N is the number 
of column updates in each mutation. Also, it is difficult because I need 
application state in order to decode the mutation values properly, the 
constraint doesn’t have that state since it isn’t running in the same JVM 
(probably not even the same machine) as the rest of the system. Getting that 
state would require additional overhead or perhaps even a REST call back to the 
original server. Doing all of that inside a constraint just isn’t feasible.

Thanks for all the helpful information, I now understand constraints much 
better than I did a few days ago.

Thanks again,

Jon Parise


From: Adam Fuchs [mailto:[email protected]]
Sent: Thursday, October 01, 2015 4:03 PM
To: [email protected]
Subject: Re: Watching for Changes with Write Ahead Log?

I would stay away from ThreadLocal -- the threads that run Constraints can be 
dynamically generated in a resizable thread pool, and cleaning up after them 
could be challenging. Static might work better if you can make it thread safe, 
maybe with a resource pool.

Adam


On Thu, Oct 1, 2015 at 2:39 PM, John Vines 
<[email protected]<mailto:[email protected]>> wrote:

As dirty as it is, that sounds like a case for a static, or maybe thread local, 
object

On Thu, Oct 1, 2015, 7:19 PM Parise, Jonathan 
<[email protected]<mailto:[email protected]>> wrote:
I have a few follow up questions in regard to constraints.

What is the lifecycle of a constraint? What I mean by this is are the 
constraints somehow tied to Accumulo’s lifecycle or are they just instantiated 
each time a mutation occurs and then disposed?

Also, are there multiple instances of the same constraint class at any time or 
do all mutation on a table go through the exact same constraint?

My guess is that  when a mutation comes in a new constraint is made through 
reflection. Then check() is called, the violation codes are parsed and the 
object is disposed/finalized.

The reason I ask is that what I want to do is update my ElasticSearch index 
each time I see a mutation on the table. However, I don’t want to have to make 
a connection, send the data and then tear down the connection each time. That’s 
a lot of unnecessary overhead and with all that overhead happening on every 
mutation performance could be badly impacted.

Is there some way to cache something like a connection and reuse it between 
calls to the Constraint’s check() method? How would such a thing be cleaned up 
if Accumulo is shut down?


Thanks again,

Jon
From: Parise, Jonathan [mailto:[email protected]]
Sent: Wednesday, September 30, 2015 9:21 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: Watching for Changes with Write Ahead Log?

In this particular case, I need to update some of my application state when 
changes made by another system occur.

I would need to do a few things to accomplish my goal.


1)      Be notified or see that a table had changed

2)      Checked that against changes I know my system has made

3)      If my system is not the originator of the change, update internal state 
to reflect the change.

Examples of state I may need to update include an ElasticSearch index and also 
an in memory cache.

I’m going to read up on constraints again and see if I can use them for this 
purpose.

Thanks!

Jon



From: Adam Fuchs [mailto:[email protected]]
Sent: Tuesday, September 29, 2015 5:46 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Watching for Changes with Write Ahead Log?

Jon,

You might think about putting a constraint on your table. I think the API for 
constraints is flexible enough for your purpose, but I'm not exactly sure how 
you would want to manage the results / side effects of your observations.

Adam


On Tue, Sep 29, 2015 at 5:41 PM, Parise, Jonathan 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I’m working on a system where generally changes to Accumulo will come through 
that system. However, in some cases, another system may change data without my 
system being aware of it.

What I would like to do is somehow listen for changes to the tables my system 
cares about. I know there is a write ahead log that I could potentially listen 
to for changes, but I don’t know how to use it. I looked around for some 
documentation about it, and I don’t see much. I get the impression that it 
isn’t really intended for this type of use case.

Does anyone have any suggestions on how to watch a table for changes and then 
determine if those changes were made by a different system.

Is there some documentation about how to use the write ahead log?


Thanks,

Jon Parise


Reply via email to