Re: Images in FOP 0.92beta

2006-07-17 Thread Chris Bowditch

[EMAIL PROTECTED] wrote:


Hi Jeremias,



snip/



   Well I figure that the thread will just be blocked in queue.remove
most of the time unless it has something to do.  I don't think there
is much overhead for a thread in the types of systems we are targeting
(i.e. not small constrained devices).  Note that this is one thread
that is used for all CleanerThread sub objects (so it's not like you
are likely to spawn lots of threads).


Don't forget that a lot of folk deploy FOP/Batik inside Web containers 
or Application Servers, where spawning new Threads is considered illegal.


snip/

Chris




Re: Images in FOP 0.92beta

2006-07-15 Thread thomas . deweese
Hi Jeremias,

Jeremias Maerki [EMAIL PROTECTED] wrote on 07/14/2006 04:26:57 PM:

 At first, I'd have preferred to avoid an extra thread if possible so I
 just added a local ReferenceQueue and used poll() to do house-keeping
 whenever a user agent signs off. I assume you don't have a
 non-too-frequently called method you could do on-demand house-keeping 
in,
 so the thread is probably ok. 

   Well I figure that the thread will just be blocked in queue.remove
most of the time unless it has something to do.  I don't think there
is much overhead for a thread in the types of systems we are targeting
(i.e. not small constrained devices).  Note that this is one thread
that is used for all CleanerThread sub objects (so it's not like you
are likely to spawn lots of threads).

   Some people put the cleaning in the management calls (so you
poll the queue when people add/remove elements from the hash).  I'm
not fond of that as it means you are borrowing a 'strangers' thread
to do your work (it just feels ugly).

 And given that we have Batik in memory
 anyway FOP could co-use that thread. But since I'd like to avoid
 dependencies on Batik directly if possible, can we move CleanerThread to
 XML Graphics Commons and rename it to ReferenceCleanerThread to give it
 a more speaking name? 

   I was under the impression that most of the stuff in
batik.util will find it's way into graphics commons.  As for renaming
I don't think it's a big deal.

 The SoftReferenceCache is indeed a little odd, especially the method
 names. I think I'll skip that one for now.

   It is meant to be subclassed to provide a strongly typed interface
(notice all the '*Impl' methods are protected.  So the subclass can 
provide public versions that take strongly typed parameters.

 Some other interesting things I observed while playing around for those
 interested (ATM, I'm still doing the house-keeping without the thread
 but I might rewrite):

   SoftReferences are a very powerful tool in Java, I don't think they
get enough attention in general.

 When using weak references (as the current code does but with the fixed
 behavior) FOP takes around 35 sec on my machine to produce that 182
 image PDF. Heap usage is usually around 12MB with peaks to 26MB. The
 house-keeping after the user agent retires removes around 178 
references.
 
 Switching to soft references which is actually the recommended type for
 caches, the heap usage goes up to the 64MB maximum and pretty much stay
 there. The whole thing takes 29-30 sec average. The house-keeping after
 the user agent retires removes between 161 and 170 references. So this
 means the VM actually keeps more references around, only freeing as many
 as it needs not to run into memory problems. And it runs faster this 
way.
 
 I learned a few things today. :-)

   I guess that makes it a good day ;)

 On 14.07.2006 14:35:06 thomas.deweese wrote:
  Hi all,
  
  Just a small comment on HashMaps with weak values:
  
  Jeremias Maerki [EMAIL PROTECTED] wrote on 07/13/2006 04:43:07 
PM:
  
   Ok, so I changed the WeakHashMap to a HashMap and wrapped the values 
in
   WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size 
of
   258 MB is suddenly produced without exceptions using the VM's 
default
   heap settings, never going beyond 26MB heap usage. *g*
  
 There is a potential problem with this approach that Batik ran 
into.
  Unless you go a little further those weak values accumulate in the 
map. 
  In your case this probably isn't a big deal, but for Batik where there
  are potentially of thousands (or tens of thousands, think mouse move 
  events) 
  of entries, these 'dead' entries start to add up.
  
 As a result Batik has batik.util.CleanerThread.  This class has
  inner classes that subclass the various SoftReference classes with an 
  additional method 'public void cleared()'.  This method is called by
  the CleanerThread when the object the soft reference is point at is
  cleared from memory (it uses the ReferenceQueue part of soft 
references).
  
 This gives you the hook you need to then de-register the entry from
  the has table.  This is actually an incredibly useful 'addition' to
  the standard soft reference classes (for example I will often use
  it to check if classes I think should go to GC really do go to GC).
  
 I should also mention that Batik has a class called 
  'SoftReferenceCache'
  which is a thread safe implementation of exactly what you just 
  implemented. 
  The interface may seem a little odd but it is designed to ensure that
  only one party ever has to decode a resource even if multiple threads
  request it at the same time.
  
 Anyway just thought I would add my 2 cents...
 
 
 
 Jeremias Maerki
 



Re: Images in FOP 0.92beta

2006-07-14 Thread thomas . deweese
Hi all,

Just a small comment on HashMaps with weak values:

Jeremias Maerki [EMAIL PROTECTED] wrote on 07/13/2006 04:43:07 PM:

 Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in
 WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of
 258 MB is suddenly produced without exceptions using the VM's default
 heap settings, never going beyond 26MB heap usage. *g*

   There is a potential problem with this approach that Batik ran into.
Unless you go a little further those weak values accumulate in the map. 
In your case this probably isn't a big deal, but for Batik where there
are potentially of thousands (or tens of thousands, think mouse move 
events) 
of entries, these 'dead' entries start to add up.

   As a result Batik has batik.util.CleanerThread.  This class has
inner classes that subclass the various SoftReference classes with an 
additional method 'public void cleared()'.  This method is called by
the CleanerThread when the object the soft reference is point at is
cleared from memory (it uses the ReferenceQueue part of soft references).

   This gives you the hook you need to then de-register the entry from
the has table.  This is actually an incredibly useful 'addition' to
the standard soft reference classes (for example I will often use
it to check if classes I think should go to GC really do go to GC).

   I should also mention that Batik has a class called 
'SoftReferenceCache'
which is a thread safe implementation of exactly what you just 
implemented. 
The interface may seem a little odd but it is designed to ensure that
only one party ever has to decode a resource even if multiple threads
request it at the same time.

   Anyway just thought I would add my 2 cents...



Re: Images in FOP 0.92beta

2006-07-14 Thread Jeremias Maerki
That was worth more than 2 cents. Thanks, Thomas. I didn't really care
too much about left-over references at first, but in a long-running
service they add up unnecessarily even if it's only a Map.Entry, a
String and a Reference instance per entry.

At first, I'd have preferred to avoid an extra thread if possible so I
just added a local ReferenceQueue and used poll() to do house-keeping
whenever a user agent signs off. I assume you don't have a
non-too-frequently called method you could do on-demand house-keeping in,
so the thread is probably ok. And given that we have Batik in memory
anyway FOP could co-use that thread. But since I'd like to avoid
dependencies on Batik directly if possible, can we move CleanerThread to
XML Graphics Commons and rename it to ReferenceCleanerThread to give it
a more speaking name? In the beginning, this means we will have two
threads doing the same thing but it is ultimately cleaner design in the
long run (when Batik starts using Commons).

The SoftReferenceCache is indeed a little odd, especially the method
names. I think I'll skip that one for now.

Some other interesting things I observed while playing around for those
interested (ATM, I'm still doing the house-keeping without the thread
but I might rewrite):

When using weak references (as the current code does but with the fixed
behaviour) FOP takes around 35 sec on my machine to produce that 182
image PDF. Heap usage is usually around 12MB with peaks to 26MB. The
house-keeping after the user agent retires removes around 178 references.

Switching to soft references which is actually the recommended type for
caches, the heap usage goes up to the 64MB maximum and pretty much stay
there. The whole thing takes 29-30 sec average. The house-keeping after
the user agent retires removes between 161 and 170 references. So this
means the VM actually keeps more references around, only freeing as many
as it needs not to run into memory problems. And it runs faster this way.

I learned a few things today. :-)

On 14.07.2006 14:35:06 thomas.deweese wrote:
 Hi all,
 
 Just a small comment on HashMaps with weak values:
 
 Jeremias Maerki [EMAIL PROTECTED] wrote on 07/13/2006 04:43:07 PM:
 
  Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in
  WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of
  258 MB is suddenly produced without exceptions using the VM's default
  heap settings, never going beyond 26MB heap usage. *g*
 
There is a potential problem with this approach that Batik ran into.
 Unless you go a little further those weak values accumulate in the map. 
 In your case this probably isn't a big deal, but for Batik where there
 are potentially of thousands (or tens of thousands, think mouse move 
 events) 
 of entries, these 'dead' entries start to add up.
 
As a result Batik has batik.util.CleanerThread.  This class has
 inner classes that subclass the various SoftReference classes with an 
 additional method 'public void cleared()'.  This method is called by
 the CleanerThread when the object the soft reference is point at is
 cleared from memory (it uses the ReferenceQueue part of soft references).
 
This gives you the hook you need to then de-register the entry from
 the has table.  This is actually an incredibly useful 'addition' to
 the standard soft reference classes (for example I will often use
 it to check if classes I think should go to GC really do go to GC).
 
I should also mention that Batik has a class called 
 'SoftReferenceCache'
 which is a thread safe implementation of exactly what you just 
 implemented. 
 The interface may seem a little odd but it is designed to ensure that
 only one party ever has to decode a resource even if multiple threads
 request it at the same time.
 
Anyway just thought I would add my 2 cents...



Jeremias Maerki



Re: Images in FOP 0.92beta

2006-07-13 Thread Jeremias Maerki
Jörg,

remember this thread on fop-users? I've just found out what's wrong.

There's absolutely nothing wrong with the PDFRenderer or the PDF library
concerning reference freeing. It does it so as soon as each image is
written to the PDF which always happens immediately.

But I found that org.apache.fop.fo.flow.ExternalGraphic unnecessarily
maintains a hard reference on a FopImage. Unnecessarily, because we just
need the instrinsic size there. The FopImage is never reset to null
after use. I fixed that and: d'oh, still not good.

I ended up in the image cache and in the Javadocs for WeakHashMap where
I found that little detail that the weak reference is on the key, not
the value. And the key is the URL (String) which is passed around in FOP.
Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in
WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of
258 MB is suddenly produced without exceptions using the VM's default
heap settings, never going beyond 26MB heap usage. *g*

Will test some more and then commit later.

On 21.06.2006 23:03:38 J.Pietschmann wrote:
 Jeremias Maerki wrote:
  Ouch, that could explain it. No, no changes in that area. Actually,
  images could be written to the file immediately and then released
  instead of having to wait until the next page-sequence is finished.
 
 While the image data is written as soon as possible, the XObject
 which also points to the image object is kept for the object dictionary
 which is written much later. There have been changes in the way the
 object dictionaries are written to the PDF which I didn't track.
 
  Should be easy to fix.
 
 Unfortunately, the XObject seems to query some data from the image
 object while writing the dictionary.



Jeremias Maerki



Re: Images in FOP 0.92beta

2006-07-13 Thread J.Pietschmann

Jeremias Maerki wrote:

remember this thread on fop-users? I've just found out what's wrong.


Great!


There's absolutely nothing wrong with the PDFRenderer or the PDF library
concerning reference freeing. It does it so as soon as each image is
written to the PDF which always happens immediately.


Hm. I'm pretty sure in 0.20.5 a PDF object held a pointer, and the
object was using some data while writing a dictionary structure into
the PDF stream after all the real content was written.
[...]

I ended up in the image cache and in the Javadocs for WeakHashMap where
I found that little detail that the weak reference is on the key, not
the value.


Oops, my fault.

J.Pietschmann