Re: [OSM-dev] Compression types in PBF Format

2010-12-03 Thread Erik Johansson
On Wed, Dec 1, 2010 at 4:42 PM, Anthony o...@inbox.org wrote:
 On Wed, Dec 1, 2010 at 10:35 AM, Peter Körner osm-li...@mazdermind.de wrote:
 Am 30.11.2010 23:44, schrieb Anthony:

 On Tue, Nov 30, 2010 at 5:19 PM, Matt Amoszerebub...@gmail.com  wrote:
 because XML is a nearly human-readable, easy to explain and inspect
 format.

 Except when you don't include any line feeds :).

 What can be solved with a perl one-liner (well, everything can be solved
 with a perl one-liner ^^)

 What's the perl one-liner?  I wound up writing a C program.  (Now that
 I think about it, I guess I'd just have to set $/ to '', right?)

This has been asked on IRC as well, interesting that so many people
find the need to insert line feeds into a file.

Perhaps it's better to include them?


-- 
/emj

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-03 Thread Anthony
On Fri, Dec 3, 2010 at 4:57 AM, Erik Johansson e...@kth.se wrote:
 On Wed, Dec 1, 2010 at 4:42 PM, Anthony o...@inbox.org wrote:
 What's the perl one-liner?  I wound up writing a C program.  (Now that
 I think about it, I guess I'd just have to set $/ to '', right?)

 This has been asked on IRC as well, interesting that so many people
 find the need to insert line feeds into a file.

 Perhaps it's better to include them?

Definitely.  What's the point of using an incredibly bloated format
which is difficult to parse by machines, and then omitting line feeds
giving you a space savings of what, a tenth of a percent of the
compressed file?

AFAIK it's only the full history files which exhibit this nasty trait.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-03 Thread Anthony
 On Wed, Dec 1, 2010 at 4:42 PM, Anthony o...@inbox.org wrote:
 What's the perl one-liner?

perl -0076 -pe '$\=\n' filename

seems to work.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Anthony
On Tue, Nov 30, 2010 at 11:03 PM, Anthony o...@inbox.org wrote:
 On Tue, Nov 30, 2010 at 8:29 PM, Stefan de Konink ste...@konink.de wrote:
 If any of gzip/bzip2/lzma in the general give better compression ratio's
 (20% smaller), then this compression scheme should become the default
 format.

 Depends on the performance.  If all you want is max compression
 without regard to performance, you're almost surely better off using
 raw and then compressing the entire file with LZMA (e.g. 7zip or xz).

LZMA vs. zlib actually makes less of a difference than I thought it would:

-rw-r--r-- 1 a a 103M 2010-12-01 08:07 florida.osm.bz2
-rw-r--r-- 1 a a 129M 2010-12-01 08:32 florida.osm.gz
-rw-r--r-- 1 a a  74M 2010-12-01 08:19 florida.osm.pbf
-rw-r--r-- 1 a a 169M 2010-12-01 08:15 florida.osm.rawpbf
-rw-r--r-- 1 a a  62M 2010-12-01 08:15 florida.osm.rawpbf.xz
-rw-r--r-- 1 a a  86M 2010-11-25 11:29 florida.osm.xz

I suspect it would make *much more difference* when it comes to the
full history .osm, though.  Does PBF support full history files?  Does
Osmosis?

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Stefan de Konink

On Wed, 1 Dec 2010, Anthony wrote:


On Tue, Nov 30, 2010 at 11:03 PM, Anthony o...@inbox.org wrote:

On Tue, Nov 30, 2010 at 8:29 PM, Stefan de Konink ste...@konink.de wrote:

If any of gzip/bzip2/lzma in the general give better compression ratio's
(20% smaller), then this compression scheme should become the default
format.


Depends on the performance.  If all you want is max compression
without regard to performance, you're almost surely better off using
raw and then compressing the entire file with LZMA (e.g. 7zip or xz).


LZMA vs. zlib actually makes less of a difference than I thought it would:

-rw-r--r-- 1 a a 103M 2010-12-01 08:07 florida.osm.bz2
-rw-r--r-- 1 a a 129M 2010-12-01 08:32 florida.osm.gz
-rw-r--r-- 1 a a  74M 2010-12-01 08:19 florida.osm.pbf
-rw-r--r-- 1 a a 169M 2010-12-01 08:15 florida.osm.rawpbf
-rw-r--r-- 1 a a  62M 2010-12-01 08:15 florida.osm.rawpbf.xz
-rw-r--r-- 1 a a  86M 2010-11-25 11:29 florida.osm.xz

I suspect it would make *much more difference* when it comes to the
full history .osm, though.  Does PBF support full history files?  Does
Osmosis?


Did you benchmark what pbf + lzma did or did you embed lzma in osmosis?


Stefan___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Anthony
On Wed, Dec 1, 2010 at 9:28 AM, Stefan de Konink ste...@konink.de wrote:
 On Wed, 1 Dec 2010, Anthony wrote:
 LZMA vs. zlib actually makes less of a difference than I thought it would:

 -rw-r--r-- 1 a a 103M 2010-12-01 08:07 florida.osm.bz2
 -rw-r--r-- 1 a a 129M 2010-12-01 08:32 florida.osm.gz
 -rw-r--r-- 1 a a  74M 2010-12-01 08:19 florida.osm.pbf
 -rw-r--r-- 1 a a 169M 2010-12-01 08:15 florida.osm.rawpbf
 -rw-r--r-- 1 a a  62M 2010-12-01 08:15 florida.osm.rawpbf.xz
 -rw-r--r-- 1 a a  86M 2010-11-25 11:29 florida.osm.xz

 I suspect it would make *much more difference* when it comes to the
 full history .osm, though.  Does PBF support full history files?  Does
 Osmosis?

 Did you benchmark what pbf + lzma did or did you embed lzma in osmosis?

xz uses lzma.  I made an uncompressed pbf file (florida.osm.rawpbf)
and then compressed it with xz (florida.osm.rawpbf.xz).  This isn't
the same as making a pbf file which uses lzma, but it should be a good
approximation of the compression achievable by embedding lzma in the
pbf.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Stefan de Konink

On Wed, 1 Dec 2010, Anthony wrote:


Did you benchmark what pbf + lzma did or did you embed lzma in osmosis?


xz uses lzma.  I made an uncompressed pbf file (florida.osm.rawpbf)
and then compressed it with xz (florida.osm.rawpbf.xz).  This isn't
the same as making a pbf file which uses lzma, but it should be a good
approximation of the compression achievable by embedding lzma in the
pbf.


Yeah, but your lead basically shows we are talking about more than 10%...

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Peter Körner

Am 30.11.2010 23:44, schrieb Anthony:

On Tue, Nov 30, 2010 at 5:19 PM, Matt Amoszerebub...@gmail.com  wrote:

On Tue, Nov 30, 2010 at 8:41 PM, Stefan de Koninkste...@konink.de  wrote:

And if we can
change the API every week, I wonder why we are still at XML then.


because XML is a nearly human-readable, easy to explain and inspect
format.


Except when you don't include any line feeds :).


What can be solved with a perl one-liner (well, everything can be solved 
with a perl one-liner ^^)


Peter

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Anthony
On Wed, Dec 1, 2010 at 10:24 AM, Stefan de Konink ste...@konink.de wrote:
 On Wed, 1 Dec 2010, Anthony wrote:

 Did you benchmark what pbf + lzma did or did you embed lzma in osmosis?

 xz uses lzma.  I made an uncompressed pbf file (florida.osm.rawpbf)
 and then compressed it with xz (florida.osm.rawpbf.xz).  This isn't
 the same as making a pbf file which uses lzma, but it should be a good
 approximation of the compression achievable by embedding lzma in the
 pbf.

 Yeah, but your lead basically shows we are talking about more than 10%...

Yeah, probably, but at the expense of more complicated code, greater
memory usage, etc.

I'm interested now in seeing how the full history compression goes,
though.  If it can achieve 70, 80, 90% on top of zlib, then it might
be worth embedding the compression as opposed to just using it for
transfer over the Internet.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Stefan de Konink

On Wed, 1 Dec 2010, Anthony wrote:


Yeah, but your lead basically shows we are talking about more than 10%...


Yeah, probably, but at the expense of more complicated code, greater
memory usage, etc.


The hole process is IO-bound... memory is used anyway to overcome the IO 
issues...



I'm interested now in seeing how the full history compression goes,
though.  If it can achieve 70, 80, 90% on top of zlib, then it might
be worth embedding the compression as opposed to just using it for
transfer over the Internet.


The dictionary is compressed per block, so it greatly depends if the trick 
works.



Stefan

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Anthony
On Wed, Dec 1, 2010 at 10:47 AM, Stefan de Konink ste...@konink.de wrote:
 On Wed, 1 Dec 2010, Anthony wrote:

 Yeah, but your lead basically shows we are talking about more than 10%...

 Yeah, probably, but at the expense of more complicated code, greater
 memory usage, etc.

 The hole process is IO-bound... memory is used anyway to overcome the IO
 issues...

Not in an embedded system, which is where a small difference like 10%
is going to matter.

 I'm interested now in seeing how the full history compression goes,
 though.  If it can achieve 70, 80, 90% on top of zlib, then it might
 be worth embedding the compression as opposed to just using it for
 transfer over the Internet.

 The dictionary is compressed per block, so it greatly depends if the trick
 works.

32 megs is a lot better than 900K, though.  900K is how much zlib uses, right?

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Stefan de Konink

On Wed, 1 Dec 2010, Anthony wrote:


Not in an embedded system, which is where a small difference like 10%
is going to matter.


Please elaborate? Either the memory is used for a block cache or for the 
program.




I'm interested now in seeing how the full history compression goes,
though.  If it can achieve 70, 80, 90% on top of zlib, then it might
be worth embedding the compression as opposed to just using it for
transfer over the Internet.


The dictionary is compressed per block, so it greatly depends if the trick
works.


32 megs is a lot better than 900K, though.  900K is how much zlib uses, right?


I don't get your point here, what do you mean? Do you mean that the memory 
requirements for zlib is lower? Because don't forget that the extracted 
piece is kept in memory + the deserialised version. Which is basically 
much bigger right?



Stefan___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Stefan de Konink
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Op 01-12-10 17:30, Anthony schreef:
 Anyway, I'm probably completely wrong about this.  Sorry.

I guess the fastest way to verify all this is someone that adds the LZMA
and BZ2 library to java and check in osmosis. Your numbers give me the
impression that it is worth to pursue different compression strategies.


Stefan
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkz2gGcACgkQYH1+F2Rqwn1tLACfbuO+z3uLarrQ/BUUkkmHsfvX
2mIAoIlrEHucqWkmz6DV8z+9OkSDT3kf
=5p2d
-END PGP SIGNATURE-

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Scott Crosby
On Wed, Dec 1, 2010 at 9:50 AM, Anthony o...@inbox.org wrote:
 On Wed, Dec 1, 2010 at 10:47 AM, Stefan de Konink ste...@konink.de wrote:
 On Wed, 1 Dec 2010, Anthony wrote:

 Yeah, but your lead basically shows we are talking about more than 10%...

 Yeah, probably, but at the expense of more complicated code, greater
 memory usage, etc.

 The hole process is IO-bound... memory is used anyway to overcome the IO
 issues...

 Not in an embedded system, which is where a small difference like 10%
 is going to matter.

CPU's are going to get faster, for free. Developer time, especially
OSM developer time is severely limited. The community is better served
by having them doing new stuff than coding an overcomplicated format.


 I'm interested now in seeing how the full history compression goes,
 though.  If it can achieve 70, 80, 90% on top of zlib, then it might
 be worth embedding the compression as opposed to just using it for
 transfer over the Internet.

 The dictionary is compressed per block, so it greatly depends if the trick
 works.

 32 megs is a lot better than 900K, though.  900K is how much zlib uses, right?


Each fileblock is independently decodable, which means that I have to
reset the dictionary for each fileblock. There are around 100k
fileblocks in the planet, and 13gb uncompressed, which means that the
average fileblock has 130kb of data. gzip has a 32kb or 64kb (?)
window, smaller than the number of bytes in the fileblock. bzip2 has a
window that is 900kb, and LZMA is megabytes but lzma's
multimegabyte window doesn't matter, because the compressor is
restarted for each fileblock, every few hundred kilobytes.

The 15% gain you measured between .rawpbf.xz and .pbf  really lets
lzma cheat too much, because it can exploit a window tens of times
larger than it would if integrated.

Could you run your test on a whole planet, or a hack-integration of
LZMA into osmosis?

Scott

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-12-01 Thread Anthony
On Wed, Dec 1, 2010 at 12:40 PM, Scott Crosby scro...@cs.rice.edu wrote:
 The 15% gain you measured between .rawpbf.xz and .pbf  really lets
 lzma cheat too much, because it can exploit a window tens of times
 larger than it would if integrated.

I'm not sure how much that mattered.  xz -3, which I believe uses a 1
megabyte window, still compresses to 63M.

 Could you run your test on a whole planet,

Not today.  I'm due to receive my new hard drives today, which I
bought last friday, so my filesystem is in complete disarray.  I'm not
even sure which drive/partition I have my whole planet file on, at the
moment.  I'm sure whatever partition it is, it isn't currently
mounted.

 or a hack-integration of LZMA into osmosis?

That I can definitively say I'm not going to do.  I'd sooner reverse
engineer the pbf format in C than mess around with the code of
osmosis.  It took me long enough just to figure out how to install the
right jre to get osmosis to run.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Stefan de Konink

On Tue, 30 Nov 2010, Jochen Topf wrote:


The PBF format supports three compression types: zlib, lzma, and bzip2. Do
we have to support all of them? What is the currently existing software
using?

IMHO it would make more sense to just define one and stick with it. Easier
to implement for everybody, less reliance on external libs.


Why this mentality? It is trivial to implement a decompression 
algorithm and some work better than others. Sounds like complaining about 
stuff you don't have to care about.



Stefan

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Tim Teulings

Hallo!

Why this mentality? It is trivial to implement a decompression  
algorithm and some work better than others. Sounds like complaining  
about stuff you don't have to care about.


I would not implement decompression myself, I have better things to  
do. Thus I would a library for this. A library however is a  
dependency, that must be build installed and delivered (not all the  
world runs Linux with smart packaging systems) and licences have to be  
checked. It makes sense for teh developer to try to reduce such  
dependencies by agreeing on one standard compression format.


--
Gruß...
   Tim



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Stefan de Konink

On Tue, 30 Nov 2010, Tim Teulings wrote:


Hallo!

Why this mentality? It is trivial to implement a decompression algorithm 
and some work better than others. Sounds like complaining about stuff you 
don't have to care about.


I would not implement decompression myself, I have better things to do. Thus 
I would a library for this. A library however is a dependency, that must be 
build installed and delivered (not all the world runs Linux with smart 
packaging systems) and licences have to be checked. It makes sense for teh 
developer to try to reduce such dependencies by agreeing on one standard 
compression format.


Since all program interfaces virtually equal gzip, and it gives perfect 
extendability. The choose for supporting the 3 most used compression 
schemes is perfectly sound.


Stop wining about code that either you do not write, or didn't care about 
before. There was this great moment when the bitstream was defined, and 
absolutely nobody cared until people started to write pbf code.



Stefan

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Scott Crosby
On Tue, Nov 30, 2010 at 2:21 AM, Jochen Topf joc...@remote.org wrote:
 The PBF format supports three compression types: zlib, lzma, and bzip2. Do
 we have to support all of them? What is the currently existing software
 using?

Good question. I think that the bzip2 compression option is useless.
Too slow, especially on the decompression side. I'm not sure what to
do about LZMA. It offers higher compression ratio's at little loss in
decompression speed. The catch is that while everything supports
deflate, LZMA decompressor support is a lot less widespread. It might
be a valuable future option, but it is also untested.

To my knowledge, Osmosis has the only implementation of a PBF writer,
it only uses uncompressed and zlib? Has anyone else implemented a
writer?

If nobody else has their own writer, then would anyone object to me
unilaterally removing (not depreciating) bzip2 entirely, and
disabling/marking lzma as a proposed future extension?   If tests in
the future show that LZMA offers significant size decreases, it can be
enabled and support can be added.

 IMHO it would make more sense to just define one and stick with it. Easier
 to implement for everybody, less reliance on external libs.

Agreed. Insofar that a reader has to support every permutation of the
format, reducing those permutations is important. I think LZMA has
something to offer, but it needs to be tested first.

Scott

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Frederik Ramm

Stefan,

Stefan de Konink wrote:
This is the place for the 'too little, too late'. We are beyond the 
point of 'what' the bitstream should look like: you ought to handle what 
is defined now.


This is not how we work in OSM. We don't have standards. We can change 
stuff at any time, and indeed I would not hesitate for a second to 
change something in the PBF format if it turns out to repair a design 
problem or bring great benefit. (If it were my call which it isn't.)


I really don't like your attitude. It's great that you took the time to 
write pbf2osm but it seems you expect to be revered for it. You give the 
impression of someone for whom coding something is only a means to climb 
onto a platform from where he can heap spite onto others. (I remember 
you derogatory comments about C++ while you wrote pbf2osm, and putting 
comments like osmosis devs failed to read the specs in one's code is 
not exactly a sign of maturity either.)


Then you probably also noticed that it is still a (huge) open question 
to write a regression testsuite for all parsers and generators. And 
since the general opinion is now that nobody wants to move until there 
is a second implementation of osm2pbf (instead of actually switching), 
everyone is waiting this greatly annoys me and probably not only me but 
also the guy that actually took great effort to define the protocol and 
review code of others and answer questions.


What exactly is your problem? PBF is alive and kicking. I'm using both 
Osmosis PBF support and your implementation of pbf2osm on a daily basis, 
and many downstream users of Geofabrik do the same.


I find it totally respectless that *you* are now doubting his qualities 
but didn't step forward when feedback was asked.


Excuse me, but discussing potential problems of a design is not a show 
of lack of respect - unless presented in a form like the aforementioned 
osmosis devs failed to read the specs.


Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Jochen Topf
On Tue, Nov 30, 2010 at 10:53:16AM -0600, Scott Crosby wrote:
 On Tue, Nov 30, 2010 at 2:21 AM, Jochen Topf joc...@remote.org wrote:
  The PBF format supports three compression types: zlib, lzma, and bzip2. Do
  we have to support all of them? What is the currently existing software
  using?
 
 Good question. I think that the bzip2 compression option is useless.
 Too slow, especially on the decompression side. I'm not sure what to
 do about LZMA. It offers higher compression ratio's at little loss in
 decompression speed. The catch is that while everything supports
 deflate, LZMA decompressor support is a lot less widespread. It might
 be a valuable future option, but it is also untested.
 
 To my knowledge, Osmosis has the only implementation of a PBF writer,
 it only uses uncompressed and zlib? Has anyone else implemented a
 writer?
 
 If nobody else has their own writer, then would anyone object to me
 unilaterally removing (not depreciating) bzip2 entirely, and
 disabling/marking lzma as a proposed future extension?   If tests in
 the future show that LZMA offers significant size decreases, it can be
 enabled and support can be added.

+1

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Stefan de Konink
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hi,


Op 30-11-10 20:49, Frederik Ramm schreef:
 Stefan de Konink wrote:
 This is the place for the 'too little, too late'. We are beyond the
 point of 'what' the bitstream should look like: you ought to handle
 what is defined now.
 
 This is not how we work in OSM. We don't have standards. 

For some reason we do, this is not a free form fight. And if we can
change the API every week, I wonder why we are still at XML then.


 We can change
 stuff at any time, and indeed I would not hesitate for a second to
 change something in the PBF format if it turns out to repair a design
 problem or bring great benefit. (If it were my call which it isn't.)

The only reason your friend/collegue Jochen started to ask about it is
because he found it difficult to implement 4 ways to encode/decode the
data, which are in principle the same. So what that your tool doesn't
support a specific extension? If that compression is often used, who are
you fooling? Are you suddenly caring about linking -lbz2?


 I really don't like your attitude. It's great that you took the time to
 write pbf2osm but it seems you expect to be revered for it. You give the
 impression of someone for whom coding something is only a means to climb
 onto a platform from where he can heap spite onto others. (I remember
 you derogatory comments about C++ while you wrote pbf2osm, and putting
 comments like osmosis devs failed to read the specs in one's code is
 not exactly a sign of maturity either.)

Whats your point? I also wrote the entire API 0.5 (R/W) and XAPI in a C
extention to a webserver. Ab-so-lu-te-ly nobody cares what I (or
probably anyone else) writes here, it was interesting that after 2 weeks
of publication Lennard came up with some detail that everyone who would
have checked the output could have come up with after the first day the
code was published here.

My point is pretty clear, you want the threat PBF as something that is
in flux, I observe that feedback was requested and (virtually) nobody
cared. Protocolbuffers is something that can be extended. If someone
would actually CARE baout removing certain compression techniques he
would benchmark the compressionalgorithms  on the data presented and not
start in a:

I do care that it seems I am writing code that might never be
used.

...so all code of Jochen should be used now? Get real. So exactly what
Scott suggest: why does nobody step in then, write code that nobody uses
afterwards and present a proper benchmark to show that bzip/gzip/lzma is
useless?



 Then you probably also noticed that it is still a (huge) open question
 to write a regression testsuite for all parsers and generators. And
 since the general opinion is now that nobody wants to move until there
 is a second implementation of osm2pbf (instead of actually switching),
 everyone is waiting this greatly annoys me and probably not only me
 but also the guy that actually took great effort to define the
 protocol and review code of others and answer questions.
 
 What exactly is your problem? PBF is alive and kicking. I'm using both
 Osmosis PBF support and your implementation of pbf2osm on a daily basis,
 and many downstream users of Geofabrik do the same.

This is my problem:
http://planet.openstreetmap.org/

And the fact that protocol buffers probably would make the API far more
efficient.


 I find it totally respectless that *you* are now doubting his
 qualities but didn't step forward when feedback was asked.
 
 Excuse me, but discussing potential problems of a design is not a show
 of lack of respect - unless presented in a form like the aforementioned
 osmosis devs failed to read the specs.

Oh dear, so because I actually feedbacked on Scott and asked questions,
and verified my code and implemented the specs I cannot complain osmosis
didn't? Sounds like we cannot bash IE6 anymore because it did an effort
to implement HTML rendering...


Why does this subject get me so angry? Because the request shows
lazyness and not an effort te show that something is useless because the
compression algorithm are not suited.


Stefan
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkz1YXQACgkQYH1+F2Rqwn2+UwCglRWja5rs5jYs4iFp9C/PgJuE
Vw8An01ZXFsY6XFcFhEDDC9NP4B705W6
=l28+
-END PGP SIGNATURE-

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Matt Amos
On Tue, Nov 30, 2010 at 8:41 PM, Stefan de Konink ste...@konink.de wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512

 Hi,


 Op 30-11-10 20:49, Frederik Ramm schreef:
 Stefan de Konink wrote:
 This is the place for the 'too little, too late'. We are beyond the
 point of 'what' the bitstream should look like: you ought to handle
 what is defined now.

 This is not how we work in OSM. We don't have standards.

 For some reason we do, this is not a free form fight. And if we can
 change the API every week, I wonder why we are still at XML then.

because XML is a nearly human-readable, easy to explain and inspect
format. the same cannot be said of the PBF format, but then the
declared design goals of it were reduction in parsing time and file
size, not readability - and it achieves those goals superbly. however,
i think using it in the API wouldn't provide enough of a speedup,
limited by Amdahl's law, to offset the loss of those other benefits.

cheers,

matt

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Anthony
On Tue, Nov 30, 2010 at 5:19 PM, Matt Amos zerebub...@gmail.com wrote:
 On Tue, Nov 30, 2010 at 8:41 PM, Stefan de Konink ste...@konink.de wrote:
 And if we can
 change the API every week, I wonder why we are still at XML then.

 because XML is a nearly human-readable, easy to explain and inspect
 format.

Except when you don't include any line feeds :).

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Scott Crosby
On Tue, Nov 30, 2010 at 2:41 PM, Stefan de Konink ste...@konink.de wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512


 ...so all code of Jochen should be used now? Get real. So exactly what
 Scott suggest: why does nobody step in then, write code that nobody uses
 afterwards and present a proper benchmark to show that bzip/gzip/lzma is
 useless?

The real question is does supporting bzip2/lzma offer advantages that
are commensurate with the added implementation complexity, not just in
pbf2osm but in every other reader too.

Would you be willing to run an experiment with LZMA? If it shaves a
gigabyte off of the planet, then I'd say its worth further
consideration; if it shaves 100MB, then its not. Make a case for why
it should be included.


 Excuse me, but discussing potential problems of a design is not a show
 of lack of respect - unless presented in a form like the aforementioned
 osmosis devs failed to read the specs.

 Oh dear, so because I actually feedbacked on Scott and asked questions,
 and verified my code and implemented the specs I cannot complain osmosis
 didn't?

You do realize that *I* designed the format AND wrote the spec AND
wrote the osmosis reference implementation?

That means that if there are any errors or omissions in that
implementation or spec, they are my mistakes. If there is an
ambiguity, then I have made the call as to what is right. If there are
any differences between the spec, reference implementation, and the
conceptual design, I'm the one resolving the conflict and determining
the best way to fix the issue.

I do appreciate you finding the bugs and ambiguities in the spec by
being the first independent implementation, and I hope you will
consider running the LZMA experiment, but you have been rude and
insulting.

Scott

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Stefan de Konink
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hi Scott,


Op 01-12-10 00:41, Scott Crosby schreef:
 The real question is does supporting bzip2/lzma offer advantages that
 are commensurate with the added implementation complexity, not just in
 pbf2osm but in every other reader too.

If any of gzip/bzip2/lzma in the general give better compression ratio's
(20% smaller), then this compression scheme should become the default
format. Since (sadly) PBF goes into an 'archival' format opposed to a
wire format.


 Would you be willing to run an experiment with LZMA? If it shaves a
 gigabyte off of the planet, then I'd say its worth further
 consideration; if it shaves 100MB, then its not. Make a case for why
 it should be included.

I completely agree. But experimenting with LZMA means first a osm2pbf
that supports LZMA. And currently I feel that the only 'true' tool that
should do something like this should be named pgsql2pbf. I honestly
cannot find a single reason why it would be good to use the XML as
intermediate format, except for legacy support.


 Excuse me, but discussing potential problems of a design is not a show
 of lack of respect - unless presented in a form like the aforementioned
 osmosis devs failed to read the specs.

 Oh dear, so because I actually feedbacked on Scott and asked questions,
 and verified my code and implemented the specs I cannot complain osmosis
 didn't?
 
 You do realize that *I* designed the format AND wrote the spec AND
 wrote the osmosis reference implementation?

No, I didn't. But my archive also states that James Michael DuPont also
published his OSM-Osmosis version.

indepth
And for the reader; that only was presented my flame to the osmosis
implementation;

Out of the blue the OSMOSIS implementation started to introduce -1
userid's, this is in no place documented, neither is it a default at
present to represent past anonymous edits with a negative userid.
Especially since at that time the uid's couldn't be negative (by spec)
and the format specifies 'has_uid'.
/indepth


 That means that if there are any errors or omissions in that
 implementation or spec, they are my mistakes. If there is an
 ambiguity, then I have made the call as to what is right. If there are
 any differences between the spec, reference implementation, and the
 conceptual design, I'm the one resolving the conflict and determining
 the best way to fix the issue.

Since the current osmformat.proto still has a int32 for a uid, which is
in fact always positive number in the openstreetmap database, the
problem has been reported before. Would be obvious to haven't defined it
at all in message Info and use 0 in DenseInfo.


 I do appreciate you finding the bugs and ambiguities in the spec by
 being the first independent implementation, and I hope you will
 consider running the LZMA experiment, but you have been rude and
 insulting.

Basically you are asking me to run tests that Jochen should have come up
with to prove that your specification of multiple compression formats
sucked. I find this insulting. I think your choice is sound, and if a
tool doesn't implement compression scheme X, then just inform the user.

And if you found my comment in the code rude and/or insulting, I would
have expected an email of you in private about two months ago, because
honestly something by-far more rude was written there.

But again, nobody seems to care what happens here or what is written. It
is not strange that a flamewar over a format starts seven months after
initial publication or that a pointing fingers at code starts about two
months after the publication of it. I do find it interesting someone
actually bothered to read the code, sadly I cannot speak about any broad
collaboration.


Stefan
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkz1pP4ACgkQYH1+F2Rqwn2JHQCbBbYJN0EiYFCgtF2bQCP+CsVm
MA8AnjrA8bV/Tk8JE9KnqB78xwm6ma+b
=X7fJ
-END PGP SIGNATURE-

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Scott Crosby
On Tue, Nov 30, 2010 at 7:29 PM, Stefan de Konink ste...@konink.de wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512

 Hi Scott,


 Op 01-12-10 00:41, Scott Crosby schreef:
 The real question is does supporting bzip2/lzma offer advantages that
 are commensurate with the added implementation complexity, not just in
 pbf2osm but in every other reader too.

 If any of gzip/bzip2/lzma in the general give better compression ratio's
 (20% smaller), then this compression scheme should become the default
 format. Since (sadly) PBF goes into an 'archival' format opposed to a
 wire format.

I don't see anything in principal that keeps pbf from being a wire format.


 Would you be willing to run an experiment with LZMA? If it shaves a
 gigabyte off of the planet, then I'd say its worth further
 consideration; if it shaves 100MB, then its not. Make a case for why
 it should be included.

 I completely agree. But experimenting with LZMA means first a osm2pbf
 that supports LZMA.

Or, hacking it into osmosis, which has the rest of the code already written.

 And currently I feel that the only 'true' tool that
 should do something like this should be named pgsql2pbf.

I expect it will be written eventually. The planet has doubled in size
in the last year.

 I honestly
 cannot find a single reason why it would be good to use the XML as
 intermediate format, except for legacy support.

It is human readible and much more corruption resistant. (Well, XML is
corruption resistant, bzipped XML is too. gzipped XML isn't, unless
made with the option --rsyncable.). It makes a much more secure
archival format.

 indepth
 And for the reader; that only was presented my flame to the osmosis
 implementation;

 Out of the blue the OSMOSIS implementation started to introduce -1
 userid's, this is in no place documented, neither is it a default at
 present to represent past anonymous edits with a negative userid.

Actually, osmosis used OsmUser.NONE to represent those anonymous
edits. The problem is how to represent those within the limitations of
the PBF format (see below)

 Especially since at that time the uid's couldn't be negative (by spec)
 and the format specifies 'has_uid'.

DenseInfo doesn't have a has_uid method to check, as it delta-encodes uid's.

 /indepth


 Since the current osmformat.proto still has a int32 for a uid, which is
 in fact always positive number in the openstreetmap database,

To be totally pedantic, the domain of UID's is either
{set of all nonnegative integers} + 'NULL'.
OR
{set of all positive integers} + 'NULL'.

This is an edge case, which you properly identified and we came up
with a resolution. There's no 'right' fix, unfortunately, within the
limitations of the PBF format. The problem is that the PBF format
cannot express NULL, meaning no such user. Unless all metadata is
stripped, It must encode a UID *number*.

Not knowing which of the domains applied, or if UID's can be negative
in legitimate circumstances, I took the easy way out. I didn't need to
care what the domain of UID's was, I just used whatever integer
osmosis returned when calling OsmUser.getUID(), which happens to be -1
for OsmUser.NONE. My mistake was assuming that this mapping was
universal in the rest of the OSM stack.

 the
 problem has been reported before. Would be obvious to haven't defined it
 at all in message Info and use 0 in DenseInfo.



 I do appreciate you finding the bugs and ambiguities in the spec by
 being the first independent implementation, and I hope you will
 consider running the LZMA experiment, but you have been rude and
 insulting.

 Basically you are asking me to run tests that Jochen should have come up
 with to prove that your specification of multiple compression formats
 sucked.

I viewed it differently, He wanted to know if the specification needed
to be that complicated, to which I have to admit that I did not know.
That is a legitimate question. The essence of a good design is
simplicity. Each feature should have a reason for being there, a
reason strong enough to warrant being included. Does LZMA meet that
burden of proof?

Scott

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Compression types in PBF Format

2010-11-30 Thread Anthony
On Tue, Nov 30, 2010 at 8:29 PM, Stefan de Konink ste...@konink.de wrote:
 If any of gzip/bzip2/lzma in the general give better compression ratio's
 (20% smaller), then this compression scheme should become the default
 format.

Depends on the performance.  If all you want is max compression
without regard to performance, you're almost surely better off using
raw and then compressing the entire file with LZMA (e.g. 7zip or xz).

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev