Steve--

Of course, it's hard for me to diagnose your cluster from just this 
information, so maybe I'm missing something, but I can't see how taking a 
system in which some threads are indirectly blocking due to I/O (sockets with 
Oracle) and directly making the threads block through synchronization is going 
to help anything.

Unless you think the problem is that Oracle is thrashing and would have more 
throughput with fewer requests.  I would guess, given the engineering resources 
that Oracle has and the uses it's customers have put it to, that Oracle can 
deal with hundreds of simultaneous requests.

If I were you, I would first locate the bottle-neck.  Is it network bandwidth?  
NIC bandwidth?  Oracle Disk I/O?  Paging on the Oracle box?  Front-side bus 
bandwidth (common on multi-core machines)?

Given my experience building a similar clustered system on UIMA, the first 
thing I'd look at is the bandwidth usage.  With 15 nodes (you don't say how 
many cores--let me guess 60), you probably don't need 224 threads to keep the 
CPUs busy.  I run just a few more threads than I have cores (maybe 10% more).  
The trick is to design your software so that the document is on the network 
from its storage source to its pipeline exactly once.  Then all annotators must 
be local, in the same JVM, so all data movement in the pipeline is in the same 
address space.  Then put results on the network exactly once, from pipeline to 
storage destination.  You should be able to get the pipeline to the point where 
it is spending less than 5% of it's elapsed time for a document blocking on I/O.

I've measured the bandwidth at the TCP/IP level of gigabit networking at 57 
megabytes/sec out of a single machine/NIC.  That would be a ceiling, and I'm 
sure an Oracle instance will be well below that, due to disk I/O.  So measure 
your throughput (keeping in mind the data expansion in whatever protocol your 
using--hopefully not SOAP!), and compare it to 57.  This will give you some 
idea of how much room for improvement you have, and whether the bottleneck is 
network I/O or something else.  A tool like Ethereal may be useful here.

If it's something else, start looking at the CPU and disk usage on the Oracle 
box.  Maybe more RAM and a bigger SGA would help.  Maybe RAID-0 disks would 
help.  It all depends on exactly what the problem is.  You probably need an 
Oracle expert.

Hope this helps,


Greg Holmberg




 -------------- Original message ----------------------
From: Steve Suppe <[EMAIL PROTECTED]>
> A slightly different (but related question):
> 
> I've been playing around with this type of computation.  We are loading 
> data into a DB.  We have a small Linux cluster (15 multi-core nodes at the 
> moment) that we have scaled up to run 224+ instances of our pipeline.  I've 
> noticed for most of our calculations, it's really Oracle that is holding us 
> back.
> 
> In some instances, a 'computation' is classified as one 'medium to long' 
> data pull from Oracle, a bunch of analysis, and then 'small to large' 
> insertions of results.  I've dabbled in placing static DB connections and 
> mutexes through the code to guarantee that the instances on a machine only 
> access the DB one at a time, but are free to run analysis simultaneously 
> otherwise.
> 
> I have also toyed with the idea of locks that allow N number of connections 
> (instead of only  the mutual exclusion one at a time) so that I can 
> increase the connections to a point, but not overload the system.
> 
> Has anyone tried anything like this?  Or is anyone else at least running a 
> similar hardware set-up?  It would be great to compare note.
> 
> Thanks,
> Steve
> 
> At 04:09 AM 11/12/2007, Marshall Schor wrote:
> >This may not be quite precise enough.  Your Annotators will be
> >instantiated multiple times,  so that a single *instance* of an
> >annotator will not be run on multiple threads at once.  So - if you have
> >non-static fields in your annotator, they do not need to be accessed
> >with threading in mind. But if you make use of "static" fields, there is
> >only one instance of these, so access to them must be thread-safe.
> >
> >If your *application*  (not your annotator) is multi-threaded, it will
> >need to be thread-safe. You can find relevant information about this in
> >the tutorial and reference docs for UIMA (search for "thread").
> >
> >-Marshall
> >
> >Michael Baessler wrote:
> > > Benjamin Sznajder wrote:
> > >> Hi all,
> > >>
> > >> I am interested in using multi-threading in UIMA.
> > >> My aim is that the flow runs several annotators in parallel.
> > >> One of my annotators is not thread-safe. My question is, then,
> > >> Does the UIMA parallelism ( setting "MultipleDeployment=true") requires
> > >> that annotators on which this flag is set, are thread-safe?
> > >>
> > >> Regards,
> > >> Benjamin.
> > >>
> > >>
> > > Yes, your annotator have to be thread-safe when you want to run them
> > > multi-threaded.
> > >
> > > -- Michael
> > >
> > >

Reply via email to