Re: Order in JOSM files

2017-08-08 Thread Jochen Topf
On Tue, Aug 08, 2017 at 11:32:45AM +0200, Dirk Stöcker wrote:
> On Tue, 8 Aug 2017, Jochen Topf wrote:
> 
> > I would urge you to keep the order because that makes many things much
> > more efficient. For instance checking whether a file contains an object
> > twice is trivial when there is a known order but very expensive without.
> > My code for assembling multipolygons for instance needs to make sure
> > that IDs aren't in a file twice and it uses this very efficient way
> > instead of creating much more complex data structures which would need
> > more RAM and make everything slower.
> 
> In general we will not make any changes only to do changes. So the chance
> that element order stays fixed as it is now is high. I'm not aware that it
> changed in the past.
> 
> BUT: You asked if you can rely on this. And the answer to this is that this
> cannot be guaranteed. :-)
> 
> But you simply can ignore this and hope that JOSM continues to produce nice
> files and chances are high your hope will be fulfilled.

Okay, good enough. :-) At least now that you are aware of this issue, it
will not change just by accident.

I thought some more about this and what it comes down to is this: An OSM
file can either be totally unordered, so the generator doesn't provide
any "guarantees" to its ordering. Or it can be ordered in some way. How
exactly it is ordered doesn't matter that much probably, what matters is
that there is some kind of consistency others can rely on. Although
nobody ever guaranteed it, OSM files are almost always sorted nodes,
ways, relations and each object type by ID, so this is what people rely
on and this is the order that "sort" commands (like the one from osmium
or osmosis) will create. I want to extend this to: If you are using negative
IDs in your OSM file you should order them negative IDs first, then
positive IDs, both ordered by absolute values of those IDs. That's the
format I will optimize my software for and that's the format "osmium
sort" will create in the future. Then everything fits together with the
most important generator of files with negative IDs, JOSM.

Jochen
-- 
Jochen Topf  joc...@remote.org  https://www.jochentopf.com/  +49-351-31778688



Re: Order in JOSM files

2017-08-08 Thread Dirk Stöcker

On Tue, 8 Aug 2017, Jochen Topf wrote:

I would urge you to keep the order because that makes many things much 
more efficient. For instance checking whether a file contains an object 
twice is trivial when there is a known order but very expensive without. 
My code for assembling multipolygons for instance needs to make sure 
that IDs aren't in a file twice and it uses this very efficient way 
instead of creating much more complex data structures which would need 
more RAM and make everything slower.


In general we will not make any changes only to do changes. So the chance 
that element order stays fixed as it is now is high. I'm not aware that it 
changed in the past.


BUT: You asked if you can rely on this. And the answer to this is that 
this cannot be guaranteed. :-)


But you simply can ignore this and hope that JOSM continues to produce 
nice files and chances are high your hope will be fulfilled.


Ciao
--
http://www.dstoecker.eu/ (PGP key available)



Re: Order in JOSM files

2017-08-08 Thread Vincent Privat
Hello,
I rely on the ordering too.

This is a prerequisite to provide readable patches to boundaries.osm, see:
https://josm.openstreetmap.de/ticket/14833
https://josm.openstreetmap.de/ticket/15036

The other prerequisite is to keep stable ids. I have a working patch in
#14833 but not yet submitted as I must test it extensively before changing
this crucial part of JOSM.

2017-08-08 11:07 GMT+02:00 Jochen Topf :

> On Tue, Aug 08, 2017 at 10:51:36AM +0200, Simon Poole wrote:
> > And another data point: the implementation in Vespucci does not sort by
> > id (not only in theory, the output is really not ordered, which doesn't
> > cause issues with JOSM).
> >
> > Or put differently: if that becomes a requirement, it would be a good
> > idea to versionize the format (which naturally wouldn't actually solve
> > your issue).
>
> For me the files ceated by JOSM are important, not what JOSM reads.
> Having JOSM create something that is stricter than what it can read is
> perfectly backwards compatible. But I'd encourage you to also use the
> same kind of ordering in Vespucci, see my other mail for reasons.
>
> Not sure versioning would help us here. What would help is some kind of
> flag in the header of the file that tells us if the file is ordered and
> in what way. Than any software can directly see whether it can work with
> the file and, possibly, which algorithm to use.
>
> Jochen
> --
> Jochen Topf  joc...@remote.org  https://www.jochentopf.com/
> +49-351-31778688
>
>


Re: Order in JOSM files

2017-08-08 Thread Jochen Topf
On Tue, Aug 08, 2017 at 10:51:36AM +0200, Simon Poole wrote:
> And another data point: the implementation in Vespucci does not sort by
> id (not only in theory, the output is really not ordered, which doesn't
> cause issues with JOSM).
> 
> Or put differently: if that becomes a requirement, it would be a good
> idea to versionize the format (which naturally wouldn't actually solve
> your issue).

For me the files ceated by JOSM are important, not what JOSM reads.
Having JOSM create something that is stricter than what it can read is
perfectly backwards compatible. But I'd encourage you to also use the
same kind of ordering in Vespucci, see my other mail for reasons.

Not sure versioning would help us here. What would help is some kind of
flag in the header of the file that tells us if the file is ordered and
in what way. Than any software can directly see whether it can work with
the file and, possibly, which algorithm to use.

Jochen
-- 
Jochen Topf  joc...@remote.org  https://www.jochentopf.com/  +49-351-31778688



Re: Order in JOSM files

2017-08-08 Thread Jochen Topf
On Tue, Aug 08, 2017 at 10:43:22AM +0200, Dirk Stöcker wrote:
> On Tue, 8 Aug 2017, Jochen Topf wrote:
> 
> > When JOSM saves OSM files it uses a particular order: First nodes, then
> > ways, then relations as usual. For each object type it first writes out
> > objects with negative IDs (ie objects that are not uploaded yet), then
> > objects with positive IDs, both are ordered by absolute value.
> > 
> > Is this something I can rely on or is this just something that happened
> > accidentally with my version of JOSM when I tried this?
> > 
> > The reason I am asking: I sometimes get requests for Osmium features
> > from people who want to do something with files saved from JOSM, like
> > renumber them to have only small positive IDs, or convert them into
> > other formats. Osmium can read JOSM files and handle negative IDs, so
> > these things mostly work, but in some cases having a known order helps
> > (or is even necessary for correct functioning). I am currently working
> > on some things there but if JOSM would not keep to this order in the
> > future they would break again.
> 
> I would not rely on the order of the individual elements. There are ideas to
> rework the data storage to prevent changing IDs for the new objects (allows
> better diffs). That may have other side effects as well. I would expect the
> only thing you can rely on is the nodes, ways, relation order.

I would urge you to keep the order because that makes many things much
more efficient. For instance checking whether a file contains an object
twice is trivial when there is a known order but very expensive without.
My code for assembling multipolygons for instance needs to make sure
that IDs aren't in a file twice and it uses this very efficient way
instead of creating much more complex data structures which would need
more RAM and make everything slower.

OSM planet files and the usual extracts as provided by Geofabrik and
others always have objects ordered by ID so that's what a lot of
programs rely on anyway. This is not going to change. JOSM files are
special because of the negative IDs used. Having a consistent order for
the negative IDs, too, would make it easier for users here, because they
can just use such a file and don't have to sort it first. Some devs use
JOSM to generate tests for their software, for instance, and being able
to directly use the JOSM files makes things easier for them, too.

Jochen
-- 
Jochen Topf  joc...@remote.org  https://www.jochentopf.com/  +49-351-31778688



Re: Order in JOSM files

2017-08-08 Thread Simon Poole
And another data point: the implementation in Vespucci does not sort by
id (not only in theory, the output is really not ordered, which doesn't
cause issues with JOSM).

Or put differently: if that becomes a requirement, it would be a good
idea to versionize the format (which naturally wouldn't actually solve
your issue).

Simon


Am 08.08.2017 um 10:24 schrieb Jochen Topf:
> Hi!
>
> When JOSM saves OSM files it uses a particular order: First nodes, then
> ways, then relations as usual. For each object type it first writes out
> objects with negative IDs (ie objects that are not uploaded yet), then
> objects with positive IDs, both are ordered by absolute value.
>
> Is this something I can rely on or is this just something that happened
> accidentally with my version of JOSM when I tried this?
>
> The reason I am asking: I sometimes get requests for Osmium features
> from people who want to do something with files saved from JOSM, like
> renumber them to have only small positive IDs, or convert them into
> other formats. Osmium can read JOSM files and handle negative IDs, so
> these things mostly work, but in some cases having a known order helps
> (or is even necessary for correct functioning). I am currently working
> on some things there but if JOSM would not keep to this order in the
> future they would break again.
>
> Jochen




signature.asc
Description: OpenPGP digital signature


Re: Order in JOSM files

2017-08-08 Thread Dirk Stöcker

On Tue, 8 Aug 2017, Jochen Topf wrote:


When JOSM saves OSM files it uses a particular order: First nodes, then
ways, then relations as usual. For each object type it first writes out
objects with negative IDs (ie objects that are not uploaded yet), then
objects with positive IDs, both are ordered by absolute value.

Is this something I can rely on or is this just something that happened
accidentally with my version of JOSM when I tried this?

The reason I am asking: I sometimes get requests for Osmium features
from people who want to do something with files saved from JOSM, like
renumber them to have only small positive IDs, or convert them into
other formats. Osmium can read JOSM files and handle negative IDs, so
these things mostly work, but in some cases having a known order helps
(or is even necessary for correct functioning). I am currently working
on some things there but if JOSM would not keep to this order in the
future they would break again.


I would not rely on the order of the individual elements. There are ideas 
to rework the data storage to prevent changing IDs for the new objects 
(allows better diffs). That may have other side effects as well. I would 
expect the only thing you can rely on is the nodes, ways, relation order.


Ciao
--
http://www.dstoecker.eu/ (PGP key available)