Re: Refreshing external resources periodically

2017-04-29 Thread Marshall Schor
Hi Debbie,

I think this depends on what kind of external resource you have.  Are you able
to see what Java class is implementing the external resource?  For instance, I'm
guessing you must at some point in your code have some code that says something
like:

myResource.myMethodToGetDataFromIt(...).

If you have that, you can see if the class implementing myResource has a method
you could call to reload itself.  If it does, then you could build a little
timer application that went off once a day, and called that api, perhaps with
some synchronization...

Does this help, or have I misunderstood things? -Marshall

On 4/28/2017 4:13 AM, Debbie Zhang wrote:
> Hi UIMA users,
>
> I have a question regarding accessing external resources. If my external 
> resource file is the output of a database table, the data are updated daily. 
> Can I read the resource file daily as well and update my annotations 
> accordingly? I deploy my pear to elsewhere. At the moment, resource files are 
> included in the pear file so no refresh can be done. Any suggestion is very 
> welcome. Thank you.
>
> Regards,
>
> Debbie 
>
> Sent from my iPhone



Re: Limiting the memory used by an annotator ?

2017-04-29 Thread Marshall Schor
This has occasionally popped up as a user request.

Thilo makes some good practical suggestions that often work. 

If (in your case) there's some aspect of the data that causes a combinatorial
explosion in some part of the code, if you can identify that part of the code,
and have any control over it, you might be able to insert some limiting code 
there.

Limiting the amount of memory: thinking more about this, if the limit was
reached, what should happen?  It seems that the choice would be to throw a new
(subclass of) RuntimeException (runtime because it could happen almost
anywhere); the "catch" action would be to abort whatever was going on, report
the failure, and reset things (including the CAS).

This could be done already - because an exception does happen (the out-of-memory
exception).  Hopefully, this isn't too late - you mentioned that things slow
down as memory gets short.  (I suppose you could time things, and if things slow
down dramatically, use that as a trigger, too).

So maybe this is the best approach - find a spot in your code where the
"recovery" of aborting and resetting things makes sense, and install an
out-of-memory exception try / catch point (or a dramatic slow-down catcher).

A trick for out-of-memory catchers is to grab a block of memory (say, an int
array) at the start, and then have the out-of-memory code release that block, to
give the catcher room enough to run and recover.  But this might not be needed;
just unwinding the stack due to the throw also could free up memory, if your
catch point is high up the stack.

Hope this Helps.  -Marshall

On 4/29/2017 6:53 AM, Hugues de Mazancourt wrote:
> Hello UIMA users,
>
> I’m currently putting a Ruta-based system in production and I sometimes run 
> out of memory.
> This is usually caused by combinatory explosion in Ruta rules. These rules 
> are not necessary faulty: they are adapted to the documents I expect to 
> parse. But as this is an open system, people can upload whatever they want 
> and the parser crashes by multiplying annotations (or at least takes 20 
> minutes in garbage-collecting millions of annotations).
>
> Thus, my question is: is there a way to limit the memory used by an 
> annotator, or to limit the number of annotations made by an annotator, or to 
> limit the number of matches made by Ruta ?
> I prefer cancelling a parse for a given document than a 20 minutes downtime 
> of the whole system.
>
> Several UIMA-based services run in production, I guess that others certainly 
> have hit the same problem.
>
> Any hint on that topic would be very helpful.
>
> Thanks,
>
> Hugues de Mazancourt
> http://about.me/mazancourt
>
>
>
>
>



Limiting the memory used by an annotator ?

2017-04-29 Thread Hugues de Mazancourt
Hello UIMA users,

I’m currently putting a Ruta-based system in production and I sometimes run out 
of memory.
This is usually caused by combinatory explosion in Ruta rules. These rules are 
not necessary faulty: they are adapted to the documents I expect to parse. But 
as this is an open system, people can upload whatever they want and the parser 
crashes by multiplying annotations (or at least takes 20 minutes in 
garbage-collecting millions of annotations).

Thus, my question is: is there a way to limit the memory used by an annotator, 
or to limit the number of annotations made by an annotator, or to limit the 
number of matches made by Ruta ?
I prefer cancelling a parse for a given document than a 20 minutes downtime of 
the whole system.

Several UIMA-based services run in production, I guess that others certainly 
have hit the same problem.

Any hint on that topic would be very helpful.

Thanks,

Hugues de Mazancourt
http://about.me/mazancourt