Hello all, I ran into an interesting situation with a workflow. Our project manager built a workflow and was trying to do something that was 'legal' in ruote but it ended up creating an endless rewind condition. The net result was the rewind ran for about 6 hours creating 1.5 million audit entries.
Obviously this was not his intent, besides telling him not to do that again it brought me back to my old instrumentation questions. When we see ruote break it is usually one of the following things. 1.) Somebody built a bad workflow. 2.) A participant died in an unexpected way. 3.) A participant tried to do something that took a long time. 4.) Someone, or something killed a worker while it was working. 5.) We don't have enough workers running. We currently use newrelic to let us peek into what the workers are doing but that does not give us enough info. It is pretty easy for us to build a watchdog to govern the number of history items that are created an shut them down if someone goes crazy, but I was wondering if you had a better way. Also I was looking back at an old thread on fault tolerance and was wondering if you have given any thought to this: http://groups.google.com/group/openwferu-users/browse_thread/thread/c51b94fb8bb685da/3750af5580163949?lnk=gst&q=best+practice#3750af5580163949 Specifically, letting workers 'talk' to the engine. Thanks Eric Smith -- you received this message because you are subscribed to the "ruote users" group. to post : send email to [email protected] to unsubscribe : send email to [email protected] more options : http://groups.google.com/group/openwferu-users?hl=en
