Re: [osmosis-dev] calling DataPostbox.release() multiple times

2012-01-31 Thread Brett Henderson
On 19 January 2012 10:17, Richard Hansen rhan...@bbn.com wrote:


  release() and outputRelease() have been designed to support re-use.  I
 don't think I've tested it, but I believe it will work.  The two
 variables you mention are initialized near the start of those methods
 and only read after the Released flags have been set so their
 initial state isn't important.  However I can see that is confusing so
 I've modified their initial state to match their final state of true
 (I'll check it in shortly).


 I agree with that change.  The only other thing I would suggest is a
 comment indicating that the class is designed to support reuse.  (Or are
 all tasks supposed to be reusable?)


These days I try to support re-use for all tasks, but I wouldn't be
surprised if many of the older tasks don't work.

I've updated the comments to describe the re-use support.  Let me know if
it doesn't make sense.

Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] calling DataPostbox.release() multiple times

2012-01-18 Thread Brett Henderson
On 16 January 2012 07:22, Richard Hansen rhan...@bbn.com wrote:

 On 2012-01-15 07:34, Brett Henderson wrote:

 You're correct in saying that Releasable classes can be released
 multiple times, however that is not typically how they're used in the
 Osmosis pipeline.


 Does it make sense to change the Releasable API definition to say that
 release can only be called once?  Or might that break multiple-sink stages
 like merge?


I've updated the Releasable API definition to say that implementations
should support release being called multiple times but that it isn't
mandatory and cannot be relied on by clients.

Multiple sink stages like merge expose two separate sink instances to the
two upstream pipes (through MultiSink interface which allows multiple Sink
instances to be obtained) which feed into separate DataPostbox instances
internally.  Each sink instance only has its release method called once.
So a single release call limitation shouldn't break anything.



  Looking at the code a bit more I think it might only need minor changes.
  I had thought that release() and outputRelease() changed the state to
 equal that of a freshly constructed DataPostbox (to support reuse?), but
 that's not true:  inputExit and outputExit are false after construction but
 true after release().  So simply checking inputExit at the beginning of
 release() should take care of it.  See the attached (untested) patch.


release() and outputRelease() have been designed to support re-use.  I
don't think I've tested it, but I believe it will work.  The two variables
you mention are initialized near the start of those methods and only read
after the Released flags have been set so their initial state isn't
important.  However I can see that is confusing so I've modified their
initial state to match their final state of true (I'll check it in shortly).

Your patch would probably work but I'd prefer to continue to support re-use
rather than to use that variable for a second purpose.



 An alternative is to increase the burden on the users of DataPostbox. They
 could be required to either ensure that the output thread restarts or avoid
 calling DataPostbox.release() once the output thread exits. For example,
 EntityBuffer.run() could set an 'outputThreadIsRunning' boolean at the top
 and clear it at the end.  Then EntityBuffer.release() would be modified to
 only call buffer.release() if outputThreadIsRunning is true.


Race conditions between threads are tough to avoid, and I believe the above
change would introduce one.  If the output thread hasn't entered
EntityBuffer.run yet and the outputThreadIsRunning flag hasn't been set,
but the calling thread encounters an early exception and calls
EntityBuffer.release, the calling thread will think that the output thread
isn't running and there is no need to perform release functionality.  This
would then lead to the output thread blocking when it finally does start.

The way the class is constructed now, each thread will synchronise at the
initialize, complete and release methods.  If either thread skips straight
to release then the other thread detects that as an error condition and
aborts appropriately, but in all cases both threads will synchronise around
release before exiting.  The only reason to synchronise around release is
to support re-use which may never be used in practice but it should work
well.  We can't even rely on the initialize methods being called because it
is possible for a thread to fail before that is called.

I'd rather leave the class as it is designed now.  It's a tradeoff between
supporting re-use and incurring the limitation that every call to release
must be matched by a call to outputRelease in another thread.  I don't
think it's possible to have it both ways.

Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] calling DataPostbox.release() multiple times

2012-01-15 Thread Brett Henderson
On 15 January 2012 14:09, Richard Hansen rhan...@bbn.com wrote:

 Hi all,

 I believe I have found a bug in DataPostbox.release().  (I'm using git
 master, currently 43089ed.)

 According to the javadoc comments, Releaseable.release() can be called
 multiple times.  However, DataPostbox.release() can only be called once.
  If it is called a second time, it blocks forever waiting for the output
 thread (which no longer exists) to release.


Good find, thanks for the report.

You're correct in saying that Releasable classes can be released multiple
times, however that is not typically how they're used in the Osmosis
pipeline.

DataPostbox is a bit tricky because it is the main synchronisation point
between multiple threads.  Two threads communicating via a DataPostbox
perform handshakes during the initialize, complete, and release methods
(the process methods run mostly independently).  It is the release method
where each threads tells the other thread that they are about to exit.
DataPostbox does technically support multiple calls to release, but it
would require each thread to restart after the release calls and this isn't
supported (or even desirable) by the Osmosis pipeline.



 I noticed this because the pbf reader can call release() twice, triggering
 the lockup.  To recreate, you can run the following:

osmosis --read-pbf /dev/null --buffer --write-null

 The two calls to DataPostbox.release() have the following stack traces:

 java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(**Thread.java:1266)
at org.openstreetmap.osmosis.**core.store.DataPostbox.**
 release(DataPostbox.java:335)
at org.openstreetmap.osmosis.**core.buffer.v0_6.EntityBuffer.**
 release(EntityBuffer.java:64)
at crosby.binary.osmosis.**OsmosisBinaryParser.complete(**
 OsmosisBinaryParser.java:36)
at crosby.binary.file.**BlockInputStream.process(**
 BlockInputStream.java:37)
at crosby.binary.osmosis.**OsmosisReader.run(**
 OsmosisReader.java:45)
at java.lang.Thread.run(Thread.**java:679)
 java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(**Thread.java:1266)
at org.openstreetmap.osmosis.**core.store.DataPostbox.**
 release(DataPostbox.java:335)
at org.openstreetmap.osmosis.**core.buffer.v0_6.EntityBuffer.**
 release(EntityBuffer.java:64)
at crosby.binary.osmosis.**OsmosisReader.run(**
 OsmosisReader.java:50)
at java.lang.Thread.run(Thread.**java:679)

 Deleting line 36 of 
 pbf/src/main/java/crosby/**binary/osmosis/**OsmosisBinaryParser.java
 will cause DataPostbox.release() to be called only once, but that just
 avoids the bug.


I've removed line 36 as you've suggested because it shouldn't be there.
Release should only be called in a finally block that gets called
regardless of success or failure, the complete method only gets called on
success.  I think it was me that added the other release in OsmosisReader
recently when I added initialize, but apparently I didn't make a
corresponding change in OsmosisBinaryParser.

Please let me know if you see any further issues.



 I believe a proper fix to DataPostbox.release() will require a minor
 redesign of the DataPostbox lifetime management code.  I'm not familiar
 enough with the code to feel comfortable making such a change.  I can take
 a crack at it, but I'm hoping an expert has the time to take a look.


What changes would you suggest?  I believe it's working as I originally
intended, but that's not to say there isn't a better way :-)

Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] calling DataPostbox.release() multiple times

2012-01-15 Thread Richard Hansen

On 2012-01-15 07:34, Brett Henderson wrote:

You're correct in saying that Releasable classes can be released
multiple times, however that is not typically how they're used in the
Osmosis pipeline.


Does it make sense to change the Releasable API definition to say that 
release can only be called once?  Or might that break multiple-sink 
stages like merge?




DataPostbox is a bit tricky because it is the main synchronisation point
between multiple threads.  Two threads communicating via a DataPostbox
perform handshakes during the initialize, complete, and release methods
(the process methods run mostly independently).  It is the release
method where each threads tells the other thread that they are about to
exit.  DataPostbox does technically support multiple calls to release,
but it would require each thread to restart after the release calls and
this isn't supported (or even desirable) by the Osmosis pipeline.

...


What changes would you suggest?  I believe it's working as I originally
intended, but that's not to say there isn't a better way :-)


Looking at the code a bit more I think it might only need minor changes. 
 I had thought that release() and outputRelease() changed the state to 
equal that of a freshly constructed DataPostbox (to support reuse?), but 
that's not true:  inputExit and outputExit are false after construction 
but true after release().  So simply checking inputExit at the beginning 
of release() should take care of it.  See the attached (untested) patch.


An alternative is to increase the burden on the users of DataPostbox. 
They could be required to either ensure that the output thread restarts 
or avoid calling DataPostbox.release() once the output thread exits. 
For example, EntityBuffer.run() could set an 'outputThreadIsRunning' 
boolean at the top and clear it at the end.  Then EntityBuffer.release() 
would be modified to only call buffer.release() if outputThreadIsRunning 
is true.


-Richard
From 86c93d88e011ab76b79737c6185bd138f2eced76 Mon Sep 17 00:00:00 2001
From: Richard Hansen rhan...@bbn.com
Date: Sun, 15 Jan 2012 15:13:01 -0500
Subject: [PATCH] Do nothing in DataPostbox.release() if already released

The Releasable interface specification says that release() may be
called multiple times.  Before, if DataPostbox.release() was called a
second time, it would block forever waiting for the output thread
(which would no longer exist) to release.  Now subsequent calls to
release() are a no-op, preventing the lockup.
---
 .../osmosis/core/store/DataPostbox.java|5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/core/src/main/java/org/openstreetmap/osmosis/core/store/DataPostbox.java b/core/src/main/java/org/openstreetmap/osmosis/core/store/DataPostbox.java
index 372de7f..415ee40 100644
--- a/core/src/main/java/org/openstreetmap/osmosis/core/store/DataPostbox.java
+++ b/core/src/main/java/org/openstreetmap/osmosis/core/store/DataPostbox.java
@@ -335,6 +335,11 @@ public class DataPostboxT implements Initializable {
 		lock.lock();
 
 		try {
+			// support release() being called multiple
+			// times (do nothing if already released)
+			if (inputExit)
+return;
+
 			// If release is being called without having completed successfully,
 			// it is an error condition.
 			if (!inputComplete) {
-- 
1.7.8.2

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev