I'm trying to write a multithreaded crawler using Cayenne. I previously had it working with Torque.

I'm writing different information out to the database (and Solr). Some of the information is used by multiple threads, and should only be created if it doesn't already exist in the db. Outgoing links is the one that is giving trouble. Many of our pages point to the same link, so it should use that same reference in the database if one exists. If one does not exist, it should create it. Further actions should check for existence.

If I don't commit the context frequently enough, it starts attempting to insert duplicate URLs. I have that fixed, but now am getting this sort of message:

Cannot set object as destination of relationship toResource because it is in a different ObjectContext

What's the best strategy for doing frequent updates to the database with multiple threads?

I am beginning to think I'm headed down the wrong path and should switch to something else completely to store this data, such as NoSQL.

Reply via email to