Re: [OSM-dev] Osmosis errors with applying change and API0.6

2009-05-04 Thread Matt White
Lennard wrote:
 mattwh...@iinet.net.au wrote:

 New script (fails miserably now, and the input file has been migrated)
 java -jar osmosis.jar --read-xml-0.6 file=newzealand.osm 
 --read-xml-change-0.6 file=daily-latest.osc.gz --apply-change-0.6 
 --write-xml-0.6 file=newzealand-out.osm

 The error I get is : Task3-apply-change-0.6 does not support data 
 provided by default pipe stored at level 2 in the default pipe stack.

 Switch --rxc and --rx around, so --rxc is first.

 PS: Wouldn't it be better to cut out a new newzealand.osm from a 
 proper 0.6 planet, instead of migrating a 0.5 extract to 0.6?

 PPS: Aren't you missing a --bb (bounding box) or --bp (bounding 
 polygon) task after the --apply-change task? You're now getting all 
 additions outside of your newzealand.osm in as well, and running it 
 through a bbox task after applying the changes will get rid of them.

I do cut the bounding out afterwards (just wasn't cluttering the issues 
up with excess commads).

The OSMXAPI won't process requests that are country size, so I just 
apply the changes nightly going forward. Probably there are better ways 
but this one works OK.

Matt

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Tagtransform

2009-05-04 Thread Dave Stubbs
I'll get it working with latest osmosis today and upload it to svn.


2009/5/3 Steven te Brinke s.tebri...@student.utwente.nl:
 Hello,

 Is the tagtransform plugin still maintained? Because it looks like the same
 problem as 3 months ago still exists: it does not work with the latest
 release of Osmosis.
 The plugin looks very useful to me, so if noone maintains it, I'll give
 refactoring it a try, but because I have no experience with coding Osmosis
 plugins, I would prefer if someone else could do this.

 Regards,
 Steven


 Dave Stubbs schreef:

 2009/2/13  marcus.wolsc...@googlemail.com:


 On Thu, 12 Feb 2009 21:34:21 +0100, Rolf Bode-Meyer rob...@gmail.com
 wrote:


 That indeed looks promissing.
 Unfortunatelly one seems to have be a programmer to use osmosis. Every
 problem is presented as a Java exception. Some of them contain at
 least a faint idea of what could be wrong. But something like this
 leaves me clueless:

 java.lang.AbstractMethodError
   at



 com.bretth.osmosis.core.pipeline.common.TaskManagerFactory.createTaskManager(TaskManagerFactory.java:72)

 This looks like the plugin was compiled against a different version of
 osmosis.
 My guess would be, that it was build against the latest development-version
 and you are using
 the last stable release of osmosis. Something like this.
 It's something Randomjunk the developer of the plugin has to fix.
 I have not seen any contact-info on his wiki-user-page so I hope he reads
 this
 mailing-list.



 It was built against v0.29 of Osmosis which is the version I'm still
 using -- it should work with that.

 I hadn't realised the plugin interface had changed. I'll have to take
 a look some time.

 Dave

 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Tagtransform

2009-05-04 Thread Dave Stubbs
OK, now updated to work with osmosis 0.30

It has tasks for 0.5 and for 0.6, the default is now 0.6.

Source is in SVN here:
http://trac.openstreetmap.org/browser/applications/utils/osmosis/plugins/tagtransform


2009/5/4 Dave Stubbs osm.l...@randomjunk.co.uk:
 I'll get it working with latest osmosis today and upload it to svn.


 2009/5/3 Steven te Brinke s.tebri...@student.utwente.nl:
 Hello,

 Is the tagtransform plugin still maintained? Because it looks like the same
 problem as 3 months ago still exists: it does not work with the latest
 release of Osmosis.
 The plugin looks very useful to me, so if noone maintains it, I'll give
 refactoring it a try, but because I have no experience with coding Osmosis
 plugins, I would prefer if someone else could do this.

 Regards,
 Steven


 Dave Stubbs schreef:

 2009/2/13  marcus.wolsc...@googlemail.com:


 On Thu, 12 Feb 2009 21:34:21 +0100, Rolf Bode-Meyer rob...@gmail.com
 wrote:


 That indeed looks promissing.
 Unfortunatelly one seems to have be a programmer to use osmosis. Every
 problem is presented as a Java exception. Some of them contain at
 least a faint idea of what could be wrong. But something like this
 leaves me clueless:

 java.lang.AbstractMethodError
   at



 com.bretth.osmosis.core.pipeline.common.TaskManagerFactory.createTaskManager(TaskManagerFactory.java:72)

 This looks like the plugin was compiled against a different version of
 osmosis.
 My guess would be, that it was build against the latest development-version
 and you are using
 the last stable release of osmosis. Something like this.
 It's something Randomjunk the developer of the plugin has to fix.
 I have not seen any contact-info on his wiki-user-page so I hope he reads
 this
 mailing-list.



 It was built against v0.29 of Osmosis which is the version I'm still
 using -- it should work with that.

 I hadn't realised the plugin interface had changed. I'll have to take
 a look some time.

 Dave

 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Missing nodes in 200904210852-200904210853.osc.gz (should be 200905020852-200905020853.osc.gz)

2009-05-04 Thread Brett Henderson
Hi Everybody,

It appears that there are some further warts in the osmosis diffs.  But 
this time it only impacts the minute diffs.

As per Alfon's message below there is some missing data in the 
200905020852-200905020853.osc.gz minute diff.  The missing data appears 
to belong to changeset 1045077 which is a monster containing a large 
number of entities.  My best guess is that the changeset took a long 
time to insert (perhaps rails was choking on it for some time?) and as a 
result the final commit occurred more than 5 minutes after the initial 
data was added.  This meant that the osmosis extraction occurred before 
the data became visible.  The hourly changeset which runs 30 minutes 
later included the changeset so whatever the problem was had corrected 
itself by that time.

 Correcting The Problem 
If you have a database using the minute diffs, the best option is to 
reset the timestamp back to some time before May 2nd, 8am and catch up 
using hourly or daily diffs.  From there resume processing with minute 
diffs.

 Future Avoidance 
Unfortunately with the current method of extracting diffs, there is 
always a risk this may occur.  With the current minute lag interval of 5 
minutes it is very rare, but not impossible.  I am now setting up 
another minute diff process running 30 minutes behind the API which I'll 
use to audit the minute diff process.  At least this way I'll know if 
they occur again.  If it is a regular occurrence then a better solution 
will have to be devised.  If it never happens again then I'll put it 
down to cosmic rays or a 0.6 wrinkle that has since been fixed.

Brett

a...@gmx.de wrote:
 Hi Brett,

 Brett Henderson wrote:
 Hi Alfons,

 Where did you get the minute files?  They're not available on 
 planet.openstreetmap.org any longer.  There were some problems when 
 API 0.6 was first deployed, but the problem was corrected and the 
 problem change files were re-generated.  Is it possible you have some 
 of the bad files produced during that period?  I'm not aware of any 
 problems with the files currently being produced.
 Damn, it was the wrong filename :-( (My mistake, I just looked at the 
 time 0852)

 It should be 200905020852-200905020853.osc, but at least the data 
 posted below is correct. (see timestamp 2009-05-02T08:52:22Z)



 a...@gmx.de wrote:
 Hello Brett,

 it seems to me that there are several nodes missing in

 200904210852-200904210853.osc.gz

 (taken from minute diffs) e.g. especially 388501322, 388501324 and 
 388501325.

 Looking at lines 52-57

 node id=388501275 version=1 
 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther 
 lat=49.0172287 lon=11.4147053/
 node id=388501276 version=1 
 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther 
 lat=49.0170374 lon=11.413804/
 node id=388501277 version=1 
 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther 
 lat=49.0167492 lon=11.4139295/
 node id=388501426 version=1 
 timestamp=2009-05-02T08:52:33Z uid=52495 user=seawolff 
 lat=54.5844367 lon=9.8205745/
 node id=388501487 version=1 
 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki 
 lat=53.9425242 lon=11.3160177/
 node id=388501488 version=1 
 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki 
 lat=53.9422031 lon=11.3166454/

 from 200904210852-200904210853.osc it seems that many more are 
 missing.
 And for the ways at least ways 33909155 and 33909185 are also 
 missing in that minute file.
 Do you have any clue why?


 Thanks in advance and best regards

 Alfons




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Missing nodes in 200904210852-200904210853.osc.gz (should be 200905020852-200905020853.osc.gz)

2009-05-04 Thread Brett Henderson
In addition to the normal minute diffs, there are now slow minute 
diffs available here running 30 minutes behind the main API:
http://planet.openstreetmap.org/minute-slow/

They are not intended to be used directly, but if you have doubts about 
the contents of the standard minute diffs please check these files as 
well.  If their contents differ from the main diffs then a transaction 
has been committed too late to the database to be included in the 
osmosis changeset.  The only option is then to switch to the delayed 
changeset until the problem period is passed then switch back to the 
main changesets.  If I'm around I'll re-generate the minute changesets 
straight away but the chances I'll be around are fairly small because 
the busy periods are typically not in my waking hours.

I have an audit process now comparing the results of the two minute 
processes and I'll send an email around if I detect any anomalies.  I 
may make the results of this audit process public when I get time.

For interest sake, there is also an experimental set of fast minute 
diffs running 1 minute behind the API but please don't use them for 
production systems.  The link is below:
http://planet.openstreetmap.org/minute-fast/

If anybody has any questions or suggestions please let me know.


Brett Henderson wrote:
 Hi Everybody,

 It appears that there are some further warts in the osmosis diffs.  
 But this time it only impacts the minute diffs.

 As per Alfon's message below there is some missing data in the 
 200905020852-200905020853.osc.gz minute diff.  The missing data 
 appears to belong to changeset 1045077 which is a monster containing a 
 large number of entities.  My best guess is that the changeset took a 
 long time to insert (perhaps rails was choking on it for some time?) 
 and as a result the final commit occurred more than 5 minutes after 
 the initial data was added.  This meant that the osmosis extraction 
 occurred before the data became visible.  The hourly changeset which 
 runs 30 minutes later included the changeset so whatever the problem 
 was had corrected itself by that time.

  Correcting The Problem 
 If you have a database using the minute diffs, the best option is to 
 reset the timestamp back to some time before May 2nd, 8am and catch up 
 using hourly or daily diffs.  From there resume processing with minute 
 diffs.

  Future Avoidance 
 Unfortunately with the current method of extracting diffs, there is 
 always a risk this may occur.  With the current minute lag interval of 
 5 minutes it is very rare, but not impossible.  I am now setting up 
 another minute diff process running 30 minutes behind the API which 
 I'll use to audit the minute diff process.  At least this way I'll 
 know if they occur again.  If it is a regular occurrence then a better 
 solution will have to be devised.  If it never happens again then I'll 
 put it down to cosmic rays or a 0.6 wrinkle that has since been fixed.

 Brett

 a...@gmx.de wrote:
 Hi Brett,

 Brett Henderson wrote:
 Hi Alfons,

 Where did you get the minute files?  They're not available on 
 planet.openstreetmap.org any longer.  There were some problems when 
 API 0.6 was first deployed, but the problem was corrected and the 
 problem change files were re-generated.  Is it possible you have 
 some of the bad files produced during that period?  I'm not aware of 
 any problems with the files currently being produced.
 Damn, it was the wrong filename :-( (My mistake, I just looked at the 
 time 0852)

 It should be 200905020852-200905020853.osc, but at least the data 
 posted below is correct. (see timestamp 2009-05-02T08:52:22Z)



 a...@gmx.de wrote:
 Hello Brett,

 it seems to me that there are several nodes missing in

 200904210852-200904210853.osc.gz

 (taken from minute diffs) e.g. especially 388501322, 388501324 and 
 388501325.

 Looking at lines 52-57

 node id=388501275 version=1 
 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen 
 Panther lat=49.0172287 lon=11.4147053/
 node id=388501276 version=1 
 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen 
 Panther lat=49.0170374 lon=11.413804/
 node id=388501277 version=1 
 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen 
 Panther lat=49.0167492 lon=11.4139295/
 node id=388501426 version=1 
 timestamp=2009-05-02T08:52:33Z uid=52495 user=seawolff 
 lat=54.5844367 lon=9.8205745/
 node id=388501487 version=1 
 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki 
 lat=53.9425242 lon=11.3160177/
 node id=388501488 version=1 
 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki 
 lat=53.9422031 lon=11.3166454/

 from 200904210852-200904210853.osc it seems that many more are 
 missing.
 And for the ways at least ways 33909155 and 33909185 are also 
 missing in that minute file.
 Do you have any clue why?


 Thanks in advance and best regards

 Alfons






___
dev mailing list

[OSM-dev] API 0.6 - DELETE question

2009-05-04 Thread Karl Guggisberg
Hi,

just wondering why DELETE /api/0.6/[node|way|relation]/#id isn't idempotent,
i.e.
why DELETE(primitive) where primitive.visible=false will lead to 410 Gone
instead of 200 OK?

It leads to aborted changesets, i.e. 
   PUT /api/0.6/changeset/#id (
   DELETE node   where node.visible == false on the server 
   )
which result in 410 Gone -  not one of the defined error codes for
PUT /api/0.6/changeset/#id.

Regards
Karl





___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] API 0.6 - DELETE question

2009-05-04 Thread Frederik Ramm
Hi,

Karl Guggisberg wrote:
 just wondering why DELETE /api/0.6/[node|way|relation]/#id isn't idempotent,
 i.e.
 why DELETE(primitive) where primitive.visible=false will lead to 410 Gone
 instead of 200 OK?

I guess if you do not already know that the object is deleted (which I 
infer from your trying to delete it!) this means that you have an old 
version of the object. If it would not give you a 410 Gone then it would 
probably give you a 409 Conflict because of the version mismatch!

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] API 0.6 - DELETE question

2009-05-04 Thread Karl Guggisberg
 If it would not give you a 410 Gone then it would probably give you a 409
Conflict because of the version mismatch!
409 Conflict is what I would have expected from the API spec, not 410 Gone.
Adding

HTTP status code 410 (Gone)
If at least one element in the changeset has already been deleted

to the spec of DELETE /api/0.6/[node|way|relation]/#id would sync it with
the current implementation.

But still, why not treat it like a successful DELETE and reply the version
number of the already deleted element on the server? 

-- Karl 


-Ursprüngliche Nachricht-
Von: Frederik Ramm [mailto:frede...@remote.org] 
Gesendet: Montag, 4. Mai 2009 19:08
An: karl.guggisb...@guggis.ch
Cc: dev@openstreetmap.org
Betreff: Re: [OSM-dev] API 0.6 - DELETE question

Hi,

Karl Guggisberg wrote:
 just wondering why DELETE /api/0.6/[node|way|relation]/#id isn't 
 idempotent, i.e.
 why DELETE(primitive) where primitive.visible=false will lead to 410 
 Gone instead of 200 OK?

I guess if you do not already know that the object is deleted (which I infer
from your trying to delete it!) this means that you have an old version of
the object. If it would not give you a 410 Gone then it would probably give
you a 409 Conflict because of the version mismatch!

Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Sqlite module for Mapnik

2009-05-04 Thread Tomas Kolda
Hi,

this is reaction for threads about sqlite and Mapnik. I promissed that I 
post something about my research. So here it is.

http://osm.w2n.cz/

There is SVN with sources and compiled versions of tools (windows only 
at this time). At first I must say, that this is not replacement for 
current support of Sqlite in Mapnik. I did this work half year ago, so 
thats why there is only Mapnik 0.5.1.

It is very simple and uncomplete, but it works fine. If there will be 
some feedback, I will continue to finish it. Next development will be 
completely different to postgress schema, because there will be high 
performance speedups if I change rendering way.

At final state it should be just two programs. Convert/update tool and 
Web server with integrated mapnik and db server. So user just get his 
part of OSM, run convert and run server with converted data. I thing 
that there will be benefits for beginers, that is good artists (making 
icons and styles), but not advanced users (installing DBs and so on...).

Tomas




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Warum lit=yes sinnvoll ist (und sonst: BBike fuer Dresden)

2009-05-04 Thread Colin Marquardt
Siehe z.B. hier (Unbeleuchtete Wege vermeiden: )
http://bbbike.elsif.de/cgi/Dresden.cgi?via=NOscope=startname=Maille-Bahnzielname=Oberhermsdorfer+Str.scope=

http://wiki.openstreetmap.org/wiki/Key:lit

Eines Tages baue ich mal einen Mapnik-Style in einer Nachtansicht, der
die beleuchteten Strassen herausstellt.


Cheers
  Colin

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Minute Diffs Broken

2009-05-04 Thread Brett Henderson
Hi Everybody,

Unfortunately the minute diffs appear to be regularly missing data.  In 
the last 8 hours at least 3 changesets have been missed.  The ones I've 
noticed are 1076325, 1076998, 1077469.  These have been detected by 
comparing the normal minute diffs against another minute diff process 
running half an hour later.  I don't know what is causing these 
changesets to be applied to the database so slowly, whether it's just 
their size or some other factor I don't know.  I don't know if this is 
something that can be fixed, or whether the current osmosis extraction 
method is too time-sensitive and simply broken.

At some stage over the next day or so I'll try to publish the audit 
results automatically so that the problems are at least visible.

The hourly and daily diffs should be more reliable because they run with 
a 30 and 40 minute delay respectively although theoretically there's no 
guarantee that they're correct either.

So, any suggestions on how to fix this?

I've been trying to avoid requiring any changes to the main database in 
order to keep things simple but perhaps it's unavoidable.  One way 
around the problem would be to introduce delta table(s) in the main 
database populated by triggers on the existing history tables and 
containing the ids and timestamps of changes.  Osmosis could read those 
tables and delete records as it processes them.  It's a major change though.

This isn't an ideal forum for coming up with solutions, but I thought it 
was important to ensure people are aware of the problem.  I'll try to 
spend some time on IRC over the next few days.  Whatever the solution, I 
won't have the time (or skills) to do it on my own.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Warum lit=yes sinnvoll ist (und sonst: BBike fuer Dresden)

2009-05-04 Thread Tal
Das ist toll!
Ich gehe nachts zu fuss, und es wird viel besser, diese Information auf
einer Landkarte zu haben.

Tal

2009/5/5 Colin Marquardt cmarq...@googlemail.com

 Siehe z.B. hier (Unbeleuchtete Wege vermeiden: )

 http://bbbike.elsif.de/cgi/Dresden.cgi?via=NOscope=startname=Maille-Bahnzielname=Oberhermsdorfer+Str.scope=

 http://wiki.openstreetmap.org/wiki/Key:lit

 Eines Tages baue ich mal einen Mapnik-Style in einer Nachtansicht, der
 die beleuchteten Strassen herausstellt.


 Cheers
  Colin

 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Frederik Ramm
Hi,

Brett Henderson wrote:
 Unfortunately the minute diffs appear to be regularly missing data.  In 
 the last 8 hours at least 3 changesets have been missed.  The ones I've 
 noticed are 1076325, 1076998, 1077469.  These have been detected by 
 comparing the normal minute diffs against another minute diff process 
 running half an hour later. 

Can you elaborate a bit? I don't quite understand what you mean by 
changesets that have been missed. What exactly are you doing, and in 
what way do the results look wrong?

- Are you sure that we're all on the same page regarding the meaning of 
changeset columns in the database, especially that the closed_at date 
is only fixed once it is in the past - as long as closed_at is in the 
future, it can still move forward or backward in time. (I'm not even 
sure I am right on this one but I trust I'll be told by someone if not ;-)

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Warum lit=yes sinnvoll ist (und sonst: BBike fuer Dresden)

2009-05-04 Thread Colin Marquardt
Whoops,

of course I didn't mean to send that here. But it's the first time I
have seen an application that would really use lit=yes if set - this
bicycle routing page (they also have a desktop application) has a
preference for avoiding unlit ways.

Am 5. Mai 2009 00:25 schrieb Colin Marquardt cmarq...@googlemail.com:
 Siehe z.B. hier (Unbeleuchtete Wege vermeiden: )
 http://bbbike.elsif.de/cgi/Dresden.cgi?via=NOscope=startname=Maille-Bahnzielname=Oberhermsdorfer+Str.scope=

Cheers
  Colin

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] User preference for editor

2009-05-04 Thread Richard Fairhurst

Richard Fairhurst wrote:
 How I can see it working is that Potlatch could have a user pref 
 saying Offer alternate editor? with a URL as previously described

...and that's now implemented in 0.11b.

Go to the Potlatch options window (when deployed), and enter the full URL of
the service you want, with zoom, long and lat in that order, each replaced
by a !. So for a fairly pointless example, you could do:

   http://www.openstreetmap.org/?zoom=!lon=!lat=!

Then, whenever you open Potlatch on that machine, clicking 'Launch' will go
to the URL in question.

It performs some elementary matching so that, if you input the JOSM launch
URL, your user ID and home location are e-mailed to a crack troop of
Potlatch stormtroopers, who will come round your house and reeducate you.
All part of the service. :)

cheers
Richard
-- 
View this message in context: 
http://www.nabble.com/User-preference-for-editor-tp23316513p23378784.html
Sent from the OpenStreetMap - Dev mailing list archive at Nabble.com.


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Brett Henderson
Frederik Ramm wrote:
 Hi,

 Brett Henderson wrote:
 I'm not reading any of the changeset table data so the behaviour of 
 the closed_at field doesn't affect osmosis.  The changeset table is 
 effectively useless to osmosis processing because changesets aren't 
 atomic.

 Thinking about possible solutions:

 1. When updating things in a transaction, set the timestamp to the 
 commit time of the transaction. I don't believe PostgreSQL can do it.
If we could do this it'd be great.

 2. As you said, introduce changes to the database, like dirty bits or 
 change logs or so.
It's my only option at the moment.  It has a number of advantages such 
as being able to process immediately behind the API with no delay.  But 
it introduces a lot more complexity.  Part of the issue is that several 
downstream osmosis tasks want the data.  My preference would be to use 
the dirty log as a simple marker table and then pull all changes into 
a separate offline database for distribution amongst the various 
consuming osmosis processes.  It is also possible to only have a single 
osmosis consumer (eg. minute diffs) and perform post processing to merge 
them into hourly and daily diffs but an offline database would make 
other things easier such as full history deltas.

If we went down this path it needs significant enhancements to be made 
to the core database, something to stream changes out of the core db 
into a changes database, and something to feed those changes into the 
existing diff files.  I think it's perfectly do-able and I can't see any 
major showstoppers, but not a trivial task.  I'd need a lot of help from 
others :-)

 3. Make a semantic change to the way we handle diffs: Let the diff for 
 interval X not be all changes with timestamp within X but instead 
 all changes that happened in a changeset that was closed within X. 
 Changesets not being atomic should pose no problem for this (because 
 when it's closed, it's closed). This would adversely affect downstream 
 systems in that some changes are held back until the changeset is 
 closed (whereas they are passed on immediately now), but on the other 
 hand you could afford to generate the minutely diff at 5 seconds past 
 the minute because you do not have to wait for transactions to settle 
 (the actual changeset close never happens inside a transaction).
I think this would introduce far too large a delay.  What is the maximum 
age of a changeset?  That is the delay that may occur between making an 
edit and it appearing in replica databases.  I don't think that would be 
suitable for ti...@home and mapnik for instance.  It would be simple to 
implement though.  This was my original plan until I learnt that 
changesets weren't going to be atomic.

It's worth nothing that if we went with option 2 we'd have to include 
part of option 3.  If data was missed from one changeset due to delayed 
commit it would have to be included in a subsequent changeset which is a 
slight change from the current behaviour.  It shouldn't impact consumers 
so long as entity versions are ordered correctly in diff files.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Greg Troxel

Frederik Ramm frede...@remote.org writes:

 3. Make a semantic change to the way we handle diffs: Let the diff for 
 interval X not be all changes with timestamp within X but instead all 
 changes that happened in a changeset that was closed within X. 
 Changesets not being atomic should pose no problem for this (because 
 when it's closed, it's closed). This would adversely affect downstream 
 systems in that some changes are held back until the changeset is closed 
 (whereas they are passed on immediately now), but on the other hand you 
 could afford to generate the minutely diff at 5 seconds past the minute 
 because you do not have to wait for transactions to settle (the actual 
 changeset close never happens inside a transaction).

So obviously we aren't running SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE, since that would kill performance and make things harder,
but it would solve this :-)

It's possible for a transaction with effective time T to have a
commit time of T', and the minute scan for A-B for T  B  T' is not
seeing the changeset, and the B-C minute scan is considering it not in
bounds.

If the real requirement for minute diffs is that the union of them is
right, then having the minute diff generator keep track of all the
changeset IDs it has seen in the last hour, and do a query that is
basically:

  select all changesets from the last 30 minutes
  exclude all changesets in the previous 60 minute diffs

then the missing changeset would show up in the next diff, which would
be the minute it was committed in, not the minute it was started in.  If
it's known there are no holes then changeset  top_changeset could make
this faster.


pgpJICckdH4Xy.pgp
Description: PGP signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Frederik Ramm
Hi,

Brett Henderson wrote:
 3. Make a semantic change to the way we handle diffs: Let the diff for 
 interval X not be all changes with timestamp within X but instead 
 all changes that happened in a changeset that was closed within X. 

[...]

 I think this would introduce far too large a delay.  What is the maximum 
 age of a changeset?  That is the delay that may occur between making an 
 edit and it appearing in replica databases.  

The maximum age is one day. I would not view this as a big problem 
though. For me, a changeset is like a change commit in version control 
system. I do not expect others to see my changes until I have committed 
them, and it would be perfectly fine for me if downstream mirrors did 
not see my changes until I close the changeset. (The difference between 
this and a VCS being obviously that direct queries to the main API would 
see my un-commited changes while downstream system would not yet have 
them but hey.)

 I don't think that would be 
 suitable for ti...@home and mapnik for instance. 

I don't see why these systems should show data from un-closes 
changesets. The way I like to think of changesets, this might even be 
misleading - think of someone deleting a road because he wants to 
re-draw it from a better GPX trace. I would of course do both inside one 
changeset called replace road XY by better version, and when that 
changeset is closed, both the deletion of the old road and the new data 
will be propagated. With interim propagation, the old road will vanish 
from the map for a while and perhaps unnecessarily upset people.

If someone has an urgent change he wants shown quickly - then just close 
your changeset and you're fine.

 It would be simple to 
 implement though.  This was my original plan until I learnt that 
 changesets weren't going to be atomic.

You would effectively introduce atomic changesets for downstream systems 
this way. Not the worst thing to happen I'd say.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Frederik Ramm
Hi,

Greg Troxel wrote:
 So obviously we aren't running SET TRANSACTION ISOLATION LEVEL
 SERIALIZABLE, since that would kill performance and make things harder,
 but it would solve this :-)

How so? The problem seems to be too much transaction isolation, not too 
little. If we were operating on a dirty read basis then Brett's diffs 
would not miss any data (but they would contain changes that were part 
of a transaction that was later rolled back).

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Brett Henderson

Greg Troxel wrote:

Frederik Ramm frede...@remote.org writes:

  
3. Make a semantic change to the way we handle diffs: Let the diff for 
interval X not be all changes with timestamp within X but instead all 
changes that happened in a changeset that was closed within X. 
Changesets not being atomic should pose no problem for this (because 
when it's closed, it's closed). This would adversely affect downstream 
systems in that some changes are held back until the changeset is closed 
(whereas they are passed on immediately now), but on the other hand you 
could afford to generate the minutely diff at 5 seconds past the minute 
because you do not have to wait for transactions to settle (the actual 
changeset close never happens inside a transaction).



So obviously we aren't running SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE, since that would kill performance and make things harder,
but it would solve this :-)

It's possible for a transaction with effective time T to have a
commit time of T', and the minute scan for A-B for T  B  T' is not
seeing the changeset, and the B-C minute scan is considering it not in
bounds.

If the real requirement for minute diffs is that the union of them is
right, then having the minute diff generator keep track of all the
changeset IDs it has seen in the last hour, and do a query that is
basically:

  select all changesets from the last 30 minutes
  exclude all changesets in the previous 60 minute diffs

then the missing changeset would show up in the next diff, which would
be the minute it was committed in, not the minute it was started in.  If
it's known there are no holes then changeset  top_changeset could make
this faster.
  
I don't think we can use changeset ids as a way of tracking processed 
changes due to the delay that introduces.  We have to track on 
individual entities.


Individual entities will not be sequential because entities can be 
modified.  This means we can't check for holes and query with 'node_id  
top_node_id' for example.


That leaves us having to query for the maximum time a transaction could 
stay open for.  I don't know how to bound this.  Obviously 5 minutes is 
not enough.  Maybe 15 would be?  If we go with a 15 minute interval, 
combining that with the existing 5 minute delay means we have to read 10 
minutes worth of data for every minute changeset.  That's 10 times more 
data to be read from the database at a time.  It would probably work but 
it would increase the load on the main database.  The other thing we'd 
have to do is introduce a local database of some kind to track processed 
ids because osmosis gets launched from cron every minute and doesn't 
maintain any state between invocations other than the current timestamp.


It would work.  But hopefully there's a cleaner way.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Greg Troxel

Frederik Ramm frede...@remote.org writes:

 Hi,

 Greg Troxel wrote:
 So obviously we aren't running SET TRANSACTION ISOLATION LEVEL
 SERIALIZABLE, since that would kill performance and make things harder,
 but it would solve this :-)

 How so? The problem seems to be too much transaction isolation, not
 too little.

With select by time, it would still be buggy.  But if the select was
all changesets  X where X was the highest changeset in the previous
select, it would work, because there would have to be a total ordering
of transactions (at least as far as anyone can tell).  So the select of
highest would have to be in between two others, and the changeset id is
perhaps an auto-sequence, or else read/increment/write which again would
force ordering.

 If we were operating on a dirty read basis then Brett's
 diffs would not miss any data (but they would contain changes that
 were part of a transaction that was later rolled back).

Sure, that would be worse :-)


pgpx79Po7aJlP.pgp
Description: PGP signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Steve Singer
On Tue, 5 May 2009, Brett Henderson wrote:


 The way osmosis identifies changed records is by query the history table
 for entities with a timestamp within a time interval.  The time interval
 will be an hour long for hourly diffs, a minute long for minute diffs.

 For example, the node query is:
 SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
 u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
 FROM nodes e
 LEFT OUTER JOIN changesets c ON e.changeset_id = c.id LEFT OUTER JOIN
 users u ON c.user_id = u.id
 WHERE e.timestamp  ? AND e.timestamp = ? ORDER BY e.id, e.version

 If the history table records don't exist (or aren't committed) when this
 query runs, the records won't be put into the diff file.

What you want is the timestamp that the change was committed at, not the 
timestamp it was inserted at.  However there is no way to get this with 
postgresql.

The options that come to mind are

A) Modify the rails code to insert a row in a transaction table with a 
timestamp just before issuing the 'COMMIT'.  Then hope the timespan between 
that and the commit finishing is less than your update window.

B) Setup some sort of queuing system that will get committed transactions in 
a proper order.  You might want to look at 
PgQ (http://skytools.projects.postgresql.org/doc/pgq-sql.html) along with 
triggers on the node/way/relation tables.

Most of the user-level async replication options for postgresql share some 
core ideas.  They tend to have triggers inserting into a journaling table 
then use snapshots to get a consistent set of events that can be replayed.

I'm not familiar with the rails API code, but I want to make sure that the 
nodes.timestamp column your querying isn't being populated with the 
postgresql now() function but instead with some time that rails computes. 
(The now() function returns the time when your transaction started not the 
current time, this would make the skipped data problem more common)

Steve




 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Greg Troxel

Sorry, I was assuming that a changeset and a database transaction were
the same thing.  If not, we need a sequence number on additions to the
history table, and use that for knowing what's fresh.



pgpgVUHBSeHce.pgp
Description: PGP signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Brett Henderson
Frederik Ramm wrote:
 I don't see why these systems should show data from un-closes 
 changesets. The way I like to think of changesets, this might even be 
 misleading - think of someone deleting a road because he wants to 
 re-draw it from a better GPX trace. I would of course do both inside 
 one changeset called replace road XY by better version, and when 
 that changeset is closed, both the deletion of the old road and the 
 new data will be propagated. With interim propagation, the old road 
 will vanish from the map for a while and perhaps unnecessarily upset 
 people.

 If someone has an urgent change he wants shown quickly - then just 
 close your changeset and you're fine.

 It would be simple to implement though.  This was my original plan 
 until I learnt that changesets weren't going to be atomic.

 You would effectively introduce atomic changesets for downstream 
 systems this way. Not the worst thing to happen I'd say.
I'm fairly uncomfortable with this approach.  It could be very 
confusing.  But I'm prepared to be swayed, it is certainly simple :-)

Also, there's a potential flaw with this approach.  Lets say I create 
node 100 with version 1 in changeset 10 in Potlatch and leave my 
changeset open.  You then come along with JOSM and edit node 100 
creating version 2 within changeset 11 and close your changeset 
immediately.  Osmosis will pick up changeset 11 after 5 minutes and 
distribute node 100 version 2.  A day later Osmosis will pick up 
changeset 10 and distribute it node 100 version 1.  Downstream systems 
consuming those diffs would apply the wrong version of the node to their 
database.  One way around this might be to force all consumers to check 
the version id before applying changes but this is not done currently to 
the best of my knowledge, at least osmosis doesn't.


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Steve Singer
On Mon, 4 May 2009, Greg Troxel wrote:


 openstreetmap-...@scd.debian.net writes:

 Would this work?

 How about the situation:

 Changeset A creates a node

 Changeset B uses the node in a way

 Changeset B closes

 (Later) Changeset A closes

 Transactions are intended to avoid this.  It may be that the changeset B
 transaction shoudl be reading the node, in which case pgsql should
 prevent the commit of changeset B until A is closed.  Or more likely B
 could not see the node in changset A until A commits - this is the READ
 COMMITTED property, or the avoidance of dirty reads.

 Have you seen this?

We don't use transactions that span the life of changesets (with good 
reason).  We might use transactions to service a single HTTP request to the 
API (ie single POST/PUT) but not for a changeset.   Changesets can last many 
hours (especially when the user goes back out mapping in middle of editing). 
Database Transactions lasting that long (controlled by end users) will cause 
lots of pain.

Steve


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Minute Diffs Broken

2009-05-04 Thread Brett Henderson
Steve Singer wrote:
 On Tue, 5 May 2009, Brett Henderson wrote:

 That does look interesting.  I'd hope to use that outside the main 
 database though.  My thoughts were to use triggers to populate short 
 term flag tables which a single threaded process would read, use as 
 keys to select modified data into an offline database, then clear.  
 This offline database could then use a queueing system such as PgQ (I 
 haven't seen it before, will have to check it out) to send events to 
 the various consumers of the data.  I'd like to minimise access to 
 the central database if possible because 1. it will scale better, and 
 2. it adds less burden to existing DBAs.

 I agree you'd only want one process pulling data from the central 
 database and then let other clients pull from another machine.  You'd 
 have to examine how different your trigger + scanning process code 
 will be from using PgQ with 1 consumer that then stores the data in 
 another db for publishing.  You should at least look to see what 
 problems they solved.
I'll take a look.  You're right, I should avoid poorly inventing 
something that others have already done a better job of :-)  I'd hate to 
impose a bottleneck on the entire app.

 One concern with trigger based systems is that for each real INSERT 
 your doing a second insert into a queue or journal table, but there 
 might not be a way around that.
I suspect not.  So would the main limitation be IO?  It should be 
possible to separate the synchronisation data (whatever that is) into a 
separate tablespace on other disk(s) if it becomes a problem.  It's 
access patterns should be sequential, with regular reads the dataset 
size should remain comparatively small, and with fixed size records it 
shouldn't fragment.  Hopefully it's not a major issue.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev