Re: [OSM-dev] Osmosis errors with applying change and API0.6
Lennard wrote: mattwh...@iinet.net.au wrote: New script (fails miserably now, and the input file has been migrated) java -jar osmosis.jar --read-xml-0.6 file=newzealand.osm --read-xml-change-0.6 file=daily-latest.osc.gz --apply-change-0.6 --write-xml-0.6 file=newzealand-out.osm The error I get is : Task3-apply-change-0.6 does not support data provided by default pipe stored at level 2 in the default pipe stack. Switch --rxc and --rx around, so --rxc is first. PS: Wouldn't it be better to cut out a new newzealand.osm from a proper 0.6 planet, instead of migrating a 0.5 extract to 0.6? PPS: Aren't you missing a --bb (bounding box) or --bp (bounding polygon) task after the --apply-change task? You're now getting all additions outside of your newzealand.osm in as well, and running it through a bbox task after applying the changes will get rid of them. I do cut the bounding out afterwards (just wasn't cluttering the issues up with excess commads). The OSMXAPI won't process requests that are country size, so I just apply the changes nightly going forward. Probably there are better ways but this one works OK. Matt ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Tagtransform
I'll get it working with latest osmosis today and upload it to svn. 2009/5/3 Steven te Brinke s.tebri...@student.utwente.nl: Hello, Is the tagtransform plugin still maintained? Because it looks like the same problem as 3 months ago still exists: it does not work with the latest release of Osmosis. The plugin looks very useful to me, so if noone maintains it, I'll give refactoring it a try, but because I have no experience with coding Osmosis plugins, I would prefer if someone else could do this. Regards, Steven Dave Stubbs schreef: 2009/2/13 marcus.wolsc...@googlemail.com: On Thu, 12 Feb 2009 21:34:21 +0100, Rolf Bode-Meyer rob...@gmail.com wrote: That indeed looks promissing. Unfortunatelly one seems to have be a programmer to use osmosis. Every problem is presented as a Java exception. Some of them contain at least a faint idea of what could be wrong. But something like this leaves me clueless: java.lang.AbstractMethodError at com.bretth.osmosis.core.pipeline.common.TaskManagerFactory.createTaskManager(TaskManagerFactory.java:72) This looks like the plugin was compiled against a different version of osmosis. My guess would be, that it was build against the latest development-version and you are using the last stable release of osmosis. Something like this. It's something Randomjunk the developer of the plugin has to fix. I have not seen any contact-info on his wiki-user-page so I hope he reads this mailing-list. It was built against v0.29 of Osmosis which is the version I'm still using -- it should work with that. I hadn't realised the plugin interface had changed. I'll have to take a look some time. Dave ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Tagtransform
OK, now updated to work with osmosis 0.30 It has tasks for 0.5 and for 0.6, the default is now 0.6. Source is in SVN here: http://trac.openstreetmap.org/browser/applications/utils/osmosis/plugins/tagtransform 2009/5/4 Dave Stubbs osm.l...@randomjunk.co.uk: I'll get it working with latest osmosis today and upload it to svn. 2009/5/3 Steven te Brinke s.tebri...@student.utwente.nl: Hello, Is the tagtransform plugin still maintained? Because it looks like the same problem as 3 months ago still exists: it does not work with the latest release of Osmosis. The plugin looks very useful to me, so if noone maintains it, I'll give refactoring it a try, but because I have no experience with coding Osmosis plugins, I would prefer if someone else could do this. Regards, Steven Dave Stubbs schreef: 2009/2/13 marcus.wolsc...@googlemail.com: On Thu, 12 Feb 2009 21:34:21 +0100, Rolf Bode-Meyer rob...@gmail.com wrote: That indeed looks promissing. Unfortunatelly one seems to have be a programmer to use osmosis. Every problem is presented as a Java exception. Some of them contain at least a faint idea of what could be wrong. But something like this leaves me clueless: java.lang.AbstractMethodError at com.bretth.osmosis.core.pipeline.common.TaskManagerFactory.createTaskManager(TaskManagerFactory.java:72) This looks like the plugin was compiled against a different version of osmosis. My guess would be, that it was build against the latest development-version and you are using the last stable release of osmosis. Something like this. It's something Randomjunk the developer of the plugin has to fix. I have not seen any contact-info on his wiki-user-page so I hope he reads this mailing-list. It was built against v0.29 of Osmosis which is the version I'm still using -- it should work with that. I hadn't realised the plugin interface had changed. I'll have to take a look some time. Dave ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Missing nodes in 200904210852-200904210853.osc.gz (should be 200905020852-200905020853.osc.gz)
Hi Everybody, It appears that there are some further warts in the osmosis diffs. But this time it only impacts the minute diffs. As per Alfon's message below there is some missing data in the 200905020852-200905020853.osc.gz minute diff. The missing data appears to belong to changeset 1045077 which is a monster containing a large number of entities. My best guess is that the changeset took a long time to insert (perhaps rails was choking on it for some time?) and as a result the final commit occurred more than 5 minutes after the initial data was added. This meant that the osmosis extraction occurred before the data became visible. The hourly changeset which runs 30 minutes later included the changeset so whatever the problem was had corrected itself by that time. Correcting The Problem If you have a database using the minute diffs, the best option is to reset the timestamp back to some time before May 2nd, 8am and catch up using hourly or daily diffs. From there resume processing with minute diffs. Future Avoidance Unfortunately with the current method of extracting diffs, there is always a risk this may occur. With the current minute lag interval of 5 minutes it is very rare, but not impossible. I am now setting up another minute diff process running 30 minutes behind the API which I'll use to audit the minute diff process. At least this way I'll know if they occur again. If it is a regular occurrence then a better solution will have to be devised. If it never happens again then I'll put it down to cosmic rays or a 0.6 wrinkle that has since been fixed. Brett a...@gmx.de wrote: Hi Brett, Brett Henderson wrote: Hi Alfons, Where did you get the minute files? They're not available on planet.openstreetmap.org any longer. There were some problems when API 0.6 was first deployed, but the problem was corrected and the problem change files were re-generated. Is it possible you have some of the bad files produced during that period? I'm not aware of any problems with the files currently being produced. Damn, it was the wrong filename :-( (My mistake, I just looked at the time 0852) It should be 200905020852-200905020853.osc, but at least the data posted below is correct. (see timestamp 2009-05-02T08:52:22Z) a...@gmx.de wrote: Hello Brett, it seems to me that there are several nodes missing in 200904210852-200904210853.osc.gz (taken from minute diffs) e.g. especially 388501322, 388501324 and 388501325. Looking at lines 52-57 node id=388501275 version=1 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther lat=49.0172287 lon=11.4147053/ node id=388501276 version=1 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther lat=49.0170374 lon=11.413804/ node id=388501277 version=1 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther lat=49.0167492 lon=11.4139295/ node id=388501426 version=1 timestamp=2009-05-02T08:52:33Z uid=52495 user=seawolff lat=54.5844367 lon=9.8205745/ node id=388501487 version=1 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki lat=53.9425242 lon=11.3160177/ node id=388501488 version=1 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki lat=53.9422031 lon=11.3166454/ from 200904210852-200904210853.osc it seems that many more are missing. And for the ways at least ways 33909155 and 33909185 are also missing in that minute file. Do you have any clue why? Thanks in advance and best regards Alfons ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Missing nodes in 200904210852-200904210853.osc.gz (should be 200905020852-200905020853.osc.gz)
In addition to the normal minute diffs, there are now slow minute diffs available here running 30 minutes behind the main API: http://planet.openstreetmap.org/minute-slow/ They are not intended to be used directly, but if you have doubts about the contents of the standard minute diffs please check these files as well. If their contents differ from the main diffs then a transaction has been committed too late to the database to be included in the osmosis changeset. The only option is then to switch to the delayed changeset until the problem period is passed then switch back to the main changesets. If I'm around I'll re-generate the minute changesets straight away but the chances I'll be around are fairly small because the busy periods are typically not in my waking hours. I have an audit process now comparing the results of the two minute processes and I'll send an email around if I detect any anomalies. I may make the results of this audit process public when I get time. For interest sake, there is also an experimental set of fast minute diffs running 1 minute behind the API but please don't use them for production systems. The link is below: http://planet.openstreetmap.org/minute-fast/ If anybody has any questions or suggestions please let me know. Brett Henderson wrote: Hi Everybody, It appears that there are some further warts in the osmosis diffs. But this time it only impacts the minute diffs. As per Alfon's message below there is some missing data in the 200905020852-200905020853.osc.gz minute diff. The missing data appears to belong to changeset 1045077 which is a monster containing a large number of entities. My best guess is that the changeset took a long time to insert (perhaps rails was choking on it for some time?) and as a result the final commit occurred more than 5 minutes after the initial data was added. This meant that the osmosis extraction occurred before the data became visible. The hourly changeset which runs 30 minutes later included the changeset so whatever the problem was had corrected itself by that time. Correcting The Problem If you have a database using the minute diffs, the best option is to reset the timestamp back to some time before May 2nd, 8am and catch up using hourly or daily diffs. From there resume processing with minute diffs. Future Avoidance Unfortunately with the current method of extracting diffs, there is always a risk this may occur. With the current minute lag interval of 5 minutes it is very rare, but not impossible. I am now setting up another minute diff process running 30 minutes behind the API which I'll use to audit the minute diff process. At least this way I'll know if they occur again. If it is a regular occurrence then a better solution will have to be devised. If it never happens again then I'll put it down to cosmic rays or a 0.6 wrinkle that has since been fixed. Brett a...@gmx.de wrote: Hi Brett, Brett Henderson wrote: Hi Alfons, Where did you get the minute files? They're not available on planet.openstreetmap.org any longer. There were some problems when API 0.6 was first deployed, but the problem was corrected and the problem change files were re-generated. Is it possible you have some of the bad files produced during that period? I'm not aware of any problems with the files currently being produced. Damn, it was the wrong filename :-( (My mistake, I just looked at the time 0852) It should be 200905020852-200905020853.osc, but at least the data posted below is correct. (see timestamp 2009-05-02T08:52:22Z) a...@gmx.de wrote: Hello Brett, it seems to me that there are several nodes missing in 200904210852-200904210853.osc.gz (taken from minute diffs) e.g. especially 388501322, 388501324 and 388501325. Looking at lines 52-57 node id=388501275 version=1 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther lat=49.0172287 lon=11.4147053/ node id=388501276 version=1 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther lat=49.0170374 lon=11.413804/ node id=388501277 version=1 timestamp=2009-05-02T08:52:22Z uid=62236 user=Paulchen Panther lat=49.0167492 lon=11.4139295/ node id=388501426 version=1 timestamp=2009-05-02T08:52:33Z uid=52495 user=seawolff lat=54.5844367 lon=9.8205745/ node id=388501487 version=1 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki lat=53.9425242 lon=11.3160177/ node id=388501488 version=1 timestamp=2009-05-02T08:52:43Z uid=45565 user=flinki lat=53.9422031 lon=11.3166454/ from 200904210852-200904210853.osc it seems that many more are missing. And for the ways at least ways 33909155 and 33909185 are also missing in that minute file. Do you have any clue why? Thanks in advance and best regards Alfons ___ dev mailing list
[OSM-dev] API 0.6 - DELETE question
Hi, just wondering why DELETE /api/0.6/[node|way|relation]/#id isn't idempotent, i.e. why DELETE(primitive) where primitive.visible=false will lead to 410 Gone instead of 200 OK? It leads to aborted changesets, i.e. PUT /api/0.6/changeset/#id ( DELETE node where node.visible == false on the server ) which result in 410 Gone - not one of the defined error codes for PUT /api/0.6/changeset/#id. Regards Karl ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] API 0.6 - DELETE question
Hi, Karl Guggisberg wrote: just wondering why DELETE /api/0.6/[node|way|relation]/#id isn't idempotent, i.e. why DELETE(primitive) where primitive.visible=false will lead to 410 Gone instead of 200 OK? I guess if you do not already know that the object is deleted (which I infer from your trying to delete it!) this means that you have an old version of the object. If it would not give you a 410 Gone then it would probably give you a 409 Conflict because of the version mismatch! Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] API 0.6 - DELETE question
If it would not give you a 410 Gone then it would probably give you a 409 Conflict because of the version mismatch! 409 Conflict is what I would have expected from the API spec, not 410 Gone. Adding HTTP status code 410 (Gone) If at least one element in the changeset has already been deleted to the spec of DELETE /api/0.6/[node|way|relation]/#id would sync it with the current implementation. But still, why not treat it like a successful DELETE and reply the version number of the already deleted element on the server? -- Karl -Ursprüngliche Nachricht- Von: Frederik Ramm [mailto:frede...@remote.org] Gesendet: Montag, 4. Mai 2009 19:08 An: karl.guggisb...@guggis.ch Cc: dev@openstreetmap.org Betreff: Re: [OSM-dev] API 0.6 - DELETE question Hi, Karl Guggisberg wrote: just wondering why DELETE /api/0.6/[node|way|relation]/#id isn't idempotent, i.e. why DELETE(primitive) where primitive.visible=false will lead to 410 Gone instead of 200 OK? I guess if you do not already know that the object is deleted (which I infer from your trying to delete it!) this means that you have an old version of the object. If it would not give you a 410 Gone then it would probably give you a 409 Conflict because of the version mismatch! Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Sqlite module for Mapnik
Hi, this is reaction for threads about sqlite and Mapnik. I promissed that I post something about my research. So here it is. http://osm.w2n.cz/ There is SVN with sources and compiled versions of tools (windows only at this time). At first I must say, that this is not replacement for current support of Sqlite in Mapnik. I did this work half year ago, so thats why there is only Mapnik 0.5.1. It is very simple and uncomplete, but it works fine. If there will be some feedback, I will continue to finish it. Next development will be completely different to postgress schema, because there will be high performance speedups if I change rendering way. At final state it should be just two programs. Convert/update tool and Web server with integrated mapnik and db server. So user just get his part of OSM, run convert and run server with converted data. I thing that there will be benefits for beginers, that is good artists (making icons and styles), but not advanced users (installing DBs and so on...). Tomas ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Warum lit=yes sinnvoll ist (und sonst: BBike fuer Dresden)
Siehe z.B. hier (Unbeleuchtete Wege vermeiden: ) http://bbbike.elsif.de/cgi/Dresden.cgi?via=NOscope=startname=Maille-Bahnzielname=Oberhermsdorfer+Str.scope= http://wiki.openstreetmap.org/wiki/Key:lit Eines Tages baue ich mal einen Mapnik-Style in einer Nachtansicht, der die beleuchteten Strassen herausstellt. Cheers Colin ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Minute Diffs Broken
Hi Everybody, Unfortunately the minute diffs appear to be regularly missing data. In the last 8 hours at least 3 changesets have been missed. The ones I've noticed are 1076325, 1076998, 1077469. These have been detected by comparing the normal minute diffs against another minute diff process running half an hour later. I don't know what is causing these changesets to be applied to the database so slowly, whether it's just their size or some other factor I don't know. I don't know if this is something that can be fixed, or whether the current osmosis extraction method is too time-sensitive and simply broken. At some stage over the next day or so I'll try to publish the audit results automatically so that the problems are at least visible. The hourly and daily diffs should be more reliable because they run with a 30 and 40 minute delay respectively although theoretically there's no guarantee that they're correct either. So, any suggestions on how to fix this? I've been trying to avoid requiring any changes to the main database in order to keep things simple but perhaps it's unavoidable. One way around the problem would be to introduce delta table(s) in the main database populated by triggers on the existing history tables and containing the ids and timestamps of changes. Osmosis could read those tables and delete records as it processes them. It's a major change though. This isn't an ideal forum for coming up with solutions, but I thought it was important to ensure people are aware of the problem. I'll try to spend some time on IRC over the next few days. Whatever the solution, I won't have the time (or skills) to do it on my own. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Warum lit=yes sinnvoll ist (und sonst: BBike fuer Dresden)
Das ist toll! Ich gehe nachts zu fuss, und es wird viel besser, diese Information auf einer Landkarte zu haben. Tal 2009/5/5 Colin Marquardt cmarq...@googlemail.com Siehe z.B. hier (Unbeleuchtete Wege vermeiden: ) http://bbbike.elsif.de/cgi/Dresden.cgi?via=NOscope=startname=Maille-Bahnzielname=Oberhermsdorfer+Str.scope= http://wiki.openstreetmap.org/wiki/Key:lit Eines Tages baue ich mal einen Mapnik-Style in einer Nachtansicht, der die beleuchteten Strassen herausstellt. Cheers Colin ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Hi, Brett Henderson wrote: Unfortunately the minute diffs appear to be regularly missing data. In the last 8 hours at least 3 changesets have been missed. The ones I've noticed are 1076325, 1076998, 1077469. These have been detected by comparing the normal minute diffs against another minute diff process running half an hour later. Can you elaborate a bit? I don't quite understand what you mean by changesets that have been missed. What exactly are you doing, and in what way do the results look wrong? - Are you sure that we're all on the same page regarding the meaning of changeset columns in the database, especially that the closed_at date is only fixed once it is in the past - as long as closed_at is in the future, it can still move forward or backward in time. (I'm not even sure I am right on this one but I trust I'll be told by someone if not ;-) Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Warum lit=yes sinnvoll ist (und sonst: BBike fuer Dresden)
Whoops, of course I didn't mean to send that here. But it's the first time I have seen an application that would really use lit=yes if set - this bicycle routing page (they also have a desktop application) has a preference for avoiding unlit ways. Am 5. Mai 2009 00:25 schrieb Colin Marquardt cmarq...@googlemail.com: Siehe z.B. hier (Unbeleuchtete Wege vermeiden: ) http://bbbike.elsif.de/cgi/Dresden.cgi?via=NOscope=startname=Maille-Bahnzielname=Oberhermsdorfer+Str.scope= Cheers Colin ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] User preference for editor
Richard Fairhurst wrote: How I can see it working is that Potlatch could have a user pref saying Offer alternate editor? with a URL as previously described ...and that's now implemented in 0.11b. Go to the Potlatch options window (when deployed), and enter the full URL of the service you want, with zoom, long and lat in that order, each replaced by a !. So for a fairly pointless example, you could do: http://www.openstreetmap.org/?zoom=!lon=!lat=! Then, whenever you open Potlatch on that machine, clicking 'Launch' will go to the URL in question. It performs some elementary matching so that, if you input the JOSM launch URL, your user ID and home location are e-mailed to a crack troop of Potlatch stormtroopers, who will come round your house and reeducate you. All part of the service. :) cheers Richard -- View this message in context: http://www.nabble.com/User-preference-for-editor-tp23316513p23378784.html Sent from the OpenStreetMap - Dev mailing list archive at Nabble.com. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Frederik Ramm wrote: Hi, Brett Henderson wrote: I'm not reading any of the changeset table data so the behaviour of the closed_at field doesn't affect osmosis. The changeset table is effectively useless to osmosis processing because changesets aren't atomic. Thinking about possible solutions: 1. When updating things in a transaction, set the timestamp to the commit time of the transaction. I don't believe PostgreSQL can do it. If we could do this it'd be great. 2. As you said, introduce changes to the database, like dirty bits or change logs or so. It's my only option at the moment. It has a number of advantages such as being able to process immediately behind the API with no delay. But it introduces a lot more complexity. Part of the issue is that several downstream osmosis tasks want the data. My preference would be to use the dirty log as a simple marker table and then pull all changes into a separate offline database for distribution amongst the various consuming osmosis processes. It is also possible to only have a single osmosis consumer (eg. minute diffs) and perform post processing to merge them into hourly and daily diffs but an offline database would make other things easier such as full history deltas. If we went down this path it needs significant enhancements to be made to the core database, something to stream changes out of the core db into a changes database, and something to feed those changes into the existing diff files. I think it's perfectly do-able and I can't see any major showstoppers, but not a trivial task. I'd need a lot of help from others :-) 3. Make a semantic change to the way we handle diffs: Let the diff for interval X not be all changes with timestamp within X but instead all changes that happened in a changeset that was closed within X. Changesets not being atomic should pose no problem for this (because when it's closed, it's closed). This would adversely affect downstream systems in that some changes are held back until the changeset is closed (whereas they are passed on immediately now), but on the other hand you could afford to generate the minutely diff at 5 seconds past the minute because you do not have to wait for transactions to settle (the actual changeset close never happens inside a transaction). I think this would introduce far too large a delay. What is the maximum age of a changeset? That is the delay that may occur between making an edit and it appearing in replica databases. I don't think that would be suitable for ti...@home and mapnik for instance. It would be simple to implement though. This was my original plan until I learnt that changesets weren't going to be atomic. It's worth nothing that if we went with option 2 we'd have to include part of option 3. If data was missed from one changeset due to delayed commit it would have to be included in a subsequent changeset which is a slight change from the current behaviour. It shouldn't impact consumers so long as entity versions are ordered correctly in diff files. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Frederik Ramm frede...@remote.org writes: 3. Make a semantic change to the way we handle diffs: Let the diff for interval X not be all changes with timestamp within X but instead all changes that happened in a changeset that was closed within X. Changesets not being atomic should pose no problem for this (because when it's closed, it's closed). This would adversely affect downstream systems in that some changes are held back until the changeset is closed (whereas they are passed on immediately now), but on the other hand you could afford to generate the minutely diff at 5 seconds past the minute because you do not have to wait for transactions to settle (the actual changeset close never happens inside a transaction). So obviously we aren't running SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, since that would kill performance and make things harder, but it would solve this :-) It's possible for a transaction with effective time T to have a commit time of T', and the minute scan for A-B for T B T' is not seeing the changeset, and the B-C minute scan is considering it not in bounds. If the real requirement for minute diffs is that the union of them is right, then having the minute diff generator keep track of all the changeset IDs it has seen in the last hour, and do a query that is basically: select all changesets from the last 30 minutes exclude all changesets in the previous 60 minute diffs then the missing changeset would show up in the next diff, which would be the minute it was committed in, not the minute it was started in. If it's known there are no holes then changeset top_changeset could make this faster. pgpJICckdH4Xy.pgp Description: PGP signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Hi, Brett Henderson wrote: 3. Make a semantic change to the way we handle diffs: Let the diff for interval X not be all changes with timestamp within X but instead all changes that happened in a changeset that was closed within X. [...] I think this would introduce far too large a delay. What is the maximum age of a changeset? That is the delay that may occur between making an edit and it appearing in replica databases. The maximum age is one day. I would not view this as a big problem though. For me, a changeset is like a change commit in version control system. I do not expect others to see my changes until I have committed them, and it would be perfectly fine for me if downstream mirrors did not see my changes until I close the changeset. (The difference between this and a VCS being obviously that direct queries to the main API would see my un-commited changes while downstream system would not yet have them but hey.) I don't think that would be suitable for ti...@home and mapnik for instance. I don't see why these systems should show data from un-closes changesets. The way I like to think of changesets, this might even be misleading - think of someone deleting a road because he wants to re-draw it from a better GPX trace. I would of course do both inside one changeset called replace road XY by better version, and when that changeset is closed, both the deletion of the old road and the new data will be propagated. With interim propagation, the old road will vanish from the map for a while and perhaps unnecessarily upset people. If someone has an urgent change he wants shown quickly - then just close your changeset and you're fine. It would be simple to implement though. This was my original plan until I learnt that changesets weren't going to be atomic. You would effectively introduce atomic changesets for downstream systems this way. Not the worst thing to happen I'd say. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Hi, Greg Troxel wrote: So obviously we aren't running SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, since that would kill performance and make things harder, but it would solve this :-) How so? The problem seems to be too much transaction isolation, not too little. If we were operating on a dirty read basis then Brett's diffs would not miss any data (but they would contain changes that were part of a transaction that was later rolled back). Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Greg Troxel wrote: Frederik Ramm frede...@remote.org writes: 3. Make a semantic change to the way we handle diffs: Let the diff for interval X not be all changes with timestamp within X but instead all changes that happened in a changeset that was closed within X. Changesets not being atomic should pose no problem for this (because when it's closed, it's closed). This would adversely affect downstream systems in that some changes are held back until the changeset is closed (whereas they are passed on immediately now), but on the other hand you could afford to generate the minutely diff at 5 seconds past the minute because you do not have to wait for transactions to settle (the actual changeset close never happens inside a transaction). So obviously we aren't running SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, since that would kill performance and make things harder, but it would solve this :-) It's possible for a transaction with effective time T to have a commit time of T', and the minute scan for A-B for T B T' is not seeing the changeset, and the B-C minute scan is considering it not in bounds. If the real requirement for minute diffs is that the union of them is right, then having the minute diff generator keep track of all the changeset IDs it has seen in the last hour, and do a query that is basically: select all changesets from the last 30 minutes exclude all changesets in the previous 60 minute diffs then the missing changeset would show up in the next diff, which would be the minute it was committed in, not the minute it was started in. If it's known there are no holes then changeset top_changeset could make this faster. I don't think we can use changeset ids as a way of tracking processed changes due to the delay that introduces. We have to track on individual entities. Individual entities will not be sequential because entities can be modified. This means we can't check for holes and query with 'node_id top_node_id' for example. That leaves us having to query for the maximum time a transaction could stay open for. I don't know how to bound this. Obviously 5 minutes is not enough. Maybe 15 would be? If we go with a 15 minute interval, combining that with the existing 5 minute delay means we have to read 10 minutes worth of data for every minute changeset. That's 10 times more data to be read from the database at a time. It would probably work but it would increase the load on the main database. The other thing we'd have to do is introduce a local database of some kind to track processed ids because osmosis gets launched from cron every minute and doesn't maintain any state between invocations other than the current timestamp. It would work. But hopefully there's a cleaner way. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Frederik Ramm frede...@remote.org writes: Hi, Greg Troxel wrote: So obviously we aren't running SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, since that would kill performance and make things harder, but it would solve this :-) How so? The problem seems to be too much transaction isolation, not too little. With select by time, it would still be buggy. But if the select was all changesets X where X was the highest changeset in the previous select, it would work, because there would have to be a total ordering of transactions (at least as far as anyone can tell). So the select of highest would have to be in between two others, and the changeset id is perhaps an auto-sequence, or else read/increment/write which again would force ordering. If we were operating on a dirty read basis then Brett's diffs would not miss any data (but they would contain changes that were part of a transaction that was later rolled back). Sure, that would be worse :-) pgpx79Po7aJlP.pgp Description: PGP signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
On Tue, 5 May 2009, Brett Henderson wrote: The way osmosis identifies changed records is by query the history table for entities with a timestamp within a time interval. The time interval will be an hour long for hourly diffs, a minute long for minute diffs. For example, the node query is: SELECT e.id, e.version, e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude FROM nodes e LEFT OUTER JOIN changesets c ON e.changeset_id = c.id LEFT OUTER JOIN users u ON c.user_id = u.id WHERE e.timestamp ? AND e.timestamp = ? ORDER BY e.id, e.version If the history table records don't exist (or aren't committed) when this query runs, the records won't be put into the diff file. What you want is the timestamp that the change was committed at, not the timestamp it was inserted at. However there is no way to get this with postgresql. The options that come to mind are A) Modify the rails code to insert a row in a transaction table with a timestamp just before issuing the 'COMMIT'. Then hope the timespan between that and the commit finishing is less than your update window. B) Setup some sort of queuing system that will get committed transactions in a proper order. You might want to look at PgQ (http://skytools.projects.postgresql.org/doc/pgq-sql.html) along with triggers on the node/way/relation tables. Most of the user-level async replication options for postgresql share some core ideas. They tend to have triggers inserting into a journaling table then use snapshots to get a consistent set of events that can be replayed. I'm not familiar with the rails API code, but I want to make sure that the nodes.timestamp column your querying isn't being populated with the postgresql now() function but instead with some time that rails computes. (The now() function returns the time when your transaction started not the current time, this would make the skipped data problem more common) Steve ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Sorry, I was assuming that a changeset and a database transaction were the same thing. If not, we need a sequence number on additions to the history table, and use that for knowing what's fresh. pgpgVUHBSeHce.pgp Description: PGP signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Frederik Ramm wrote: I don't see why these systems should show data from un-closes changesets. The way I like to think of changesets, this might even be misleading - think of someone deleting a road because he wants to re-draw it from a better GPX trace. I would of course do both inside one changeset called replace road XY by better version, and when that changeset is closed, both the deletion of the old road and the new data will be propagated. With interim propagation, the old road will vanish from the map for a while and perhaps unnecessarily upset people. If someone has an urgent change he wants shown quickly - then just close your changeset and you're fine. It would be simple to implement though. This was my original plan until I learnt that changesets weren't going to be atomic. You would effectively introduce atomic changesets for downstream systems this way. Not the worst thing to happen I'd say. I'm fairly uncomfortable with this approach. It could be very confusing. But I'm prepared to be swayed, it is certainly simple :-) Also, there's a potential flaw with this approach. Lets say I create node 100 with version 1 in changeset 10 in Potlatch and leave my changeset open. You then come along with JOSM and edit node 100 creating version 2 within changeset 11 and close your changeset immediately. Osmosis will pick up changeset 11 after 5 minutes and distribute node 100 version 2. A day later Osmosis will pick up changeset 10 and distribute it node 100 version 1. Downstream systems consuming those diffs would apply the wrong version of the node to their database. One way around this might be to force all consumers to check the version id before applying changes but this is not done currently to the best of my knowledge, at least osmosis doesn't. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
On Mon, 4 May 2009, Greg Troxel wrote: openstreetmap-...@scd.debian.net writes: Would this work? How about the situation: Changeset A creates a node Changeset B uses the node in a way Changeset B closes (Later) Changeset A closes Transactions are intended to avoid this. It may be that the changeset B transaction shoudl be reading the node, in which case pgsql should prevent the commit of changeset B until A is closed. Or more likely B could not see the node in changset A until A commits - this is the READ COMMITTED property, or the avoidance of dirty reads. Have you seen this? We don't use transactions that span the life of changesets (with good reason). We might use transactions to service a single HTTP request to the API (ie single POST/PUT) but not for a changeset. Changesets can last many hours (especially when the user goes back out mapping in middle of editing). Database Transactions lasting that long (controlled by end users) will cause lots of pain. Steve ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Minute Diffs Broken
Steve Singer wrote: On Tue, 5 May 2009, Brett Henderson wrote: That does look interesting. I'd hope to use that outside the main database though. My thoughts were to use triggers to populate short term flag tables which a single threaded process would read, use as keys to select modified data into an offline database, then clear. This offline database could then use a queueing system such as PgQ (I haven't seen it before, will have to check it out) to send events to the various consumers of the data. I'd like to minimise access to the central database if possible because 1. it will scale better, and 2. it adds less burden to existing DBAs. I agree you'd only want one process pulling data from the central database and then let other clients pull from another machine. You'd have to examine how different your trigger + scanning process code will be from using PgQ with 1 consumer that then stores the data in another db for publishing. You should at least look to see what problems they solved. I'll take a look. You're right, I should avoid poorly inventing something that others have already done a better job of :-) I'd hate to impose a bottleneck on the entire app. One concern with trigger based systems is that for each real INSERT your doing a second insert into a queue or journal table, but there might not be a way around that. I suspect not. So would the main limitation be IO? It should be possible to separate the synchronisation data (whatever that is) into a separate tablespace on other disk(s) if it becomes a problem. It's access patterns should be sequential, with regular reads the dataset size should remain comparatively small, and with fixed size records it shouldn't fragment. Hopefully it's not a major issue. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev