Hi again everyone,
I've got a setup here like this:
One server is running tomcat-4.0 (final), plus slide, plus our webapps.
Another server (elsewhere in the same room) is running oracle.
So, the webserver box has slide set up to use:
a) FileContentStore (local)
b) JDBCDescriptorsStore (talking to oracle on another machine)
Also, we use slide's SlideRealm in tomcat-4.0 for authentication.
Our webapps (all running in the same VM as slide, only one copy of slide
is ever running at a time) do slide stuff directly, because they often
do quite a bit of work and the overhead of using webdav for stuff inside
the same VM is too high.
One of the classes is used to set a property on a slide object. A couple
of weeks ago, I fixed this to use slide's transactions (doing
token.begin(), token.commit(), and sometimes token.rollback(), just like
the webdav servlet does.
At some point since (or possibly all the time since), we've been getting
exceptions thrown occasionally (and dumped to std(out|err)). They all
look more or less like this, and have no additional information around
them in the logs.
Enlist error(Transaction 86 in HttpProcessor[7070][6]) = -4
slidestore.reference.JDBCDescriptorsStore@73a34b Branch:
HttpProcessor[7070][6]-1002069841359-86-20 Flag: 2097152
javax.transaction.xa.XAException
at
org.apache.slide.common.AbstractSimpleService.start(AbstractSimpleService.java:415)
at
slidestore.reference.JDBCDescriptorsStore.start(JDBCDescriptorsStore.java:515)
at
org.apache.slide.transaction.SlideTransaction.enlistResource(SlideTransaction.java:464)
at
org.apache.slide.store.AbstractStore.enlist(AbstractStore.java:1373)
at
org.apache.slide.store.AbstractStore.storeRevisionDescriptor(AbstractStore.java:1094)
at
org.apache.slide.store.StandardStore.storeRevisionDescriptor(StandardStore.java:606)
at org.apache.slide.content.ContentImpl.store(ContentImpl.java:943)
(followed by the rest of a very long stack trace showing tomcat starting
one of our servlets, and this servlet eventually calling content.store()
here.)
Then, in the last few days, this machine has been apparently hanging
(well, the webserver). Further investigation shows that authentication
is being attempted, but it never returns from the SlideRealm
getPassword() function. This seems to generally happen just after a
whole lot of these transaction errors have occurred, and so I think
they're related (in some way). If I throw our load-tester at it, this
generally starts happening within a couple of minutes of starting.
If I remove the transaction stuff from our webapp, things seem to work
ok (but I haven't tested this really thoroughly yet). Lots of warnings
get printed because I'm doing stuff when NOT in a transaction, of
course. However, this obviously isn't a good solution as it means that
when things go wrong, slide will have a tendency to end up in an
inconsistent state and break horribly. So that's not a long-term
solution.
Does anyone know what might be causing this? I've looked through the
first couple of layers of code, but got lost in the transaction handling
stuff in slide, which I don't really have any idea about. I know some of
you guys know it backwards, so any ideas would be very much appreciated.
Michael